简单场景下人员特定动作检测与识别外文翻译资料

 2022-08-08 11:59:51

英语原文共 17 页,剩余内容已隐藏,支付完成后下载完整资料


Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition

Juan C. Nuacute;ntilde;ez, Rauacute;l Cabido,Juan J. Pantrigo,Antonio S. Montemayor

Abstract

In this work, we address human activity and hand gesture recognition problems using 3D data sequences obtained from full-body and hand skeletons, respectively. To this aim, we propose a deep learning-based approach for temporal 3D pose recognition problems based on a combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) recurrent network. We also present a two-stage training strategy which firstly focuses on CNN training and, secondly, adjusts the full method (CNN LSTM). Experimental testing demonstrated that our training method obtains better results than a single-stage training strategy. Additionally, we propose a data augmentation method that has also been validated experimentally. Finally, we perform an extensive experimental study on publicly available data benchmarks. The results obtained show how the proposed approach reaches state-of-the-art performance when compared to the methods identified in the literature. The best results were obtained for small datasets, where the proposed data augmentation strategy has greater impact.

Keywords

Deep learning Convolutional Neural Network Recurrent neural network

Long Short-Term Memory Human activity recognition Hand gesture recognition

1. Introduction

Vision-based human action recognition concerns the task of automatically interpreting an image sequence to decide what action or activity is being performed by the subjects in the scene. It is a relevant topic in computer vision, with practical applications such as video surveillance, human-computer interaction, gaming, sports arbitration, sports training, smart homes, life-care systems, among many others [1], [2]. Due to the huge possibilities for practical application, human activity recognition problems have received the attention of researchers in the fields of computer vision, artificial intelligence and machine learning. Researchers of the field organize different contests as, for example, the ChaLearn Looking at People challenge [3], and provide large datasets as NTU RGB D [4]. As a consequence, it is possible to find a significant number of related works in the literature describing an extensive variety of methods and strategies to deal with this problem. In particular, in recent years, deep neural networks have been successfully applied in human action recognition problems as a suitable approach when relatively large datasets are available.

The toolkits of many affordable RGBD devices allow the acquisition of 3D data at interactive framerates. These devices can be used to capture human movements or hand poses, offering 3D coordinates of the joints as skeletons [5]. These skeletons can capture the evolution of the pose of a human body or hand and, therefore, they can be used to classify the activities or gestures performed by subjects in the area.

In this paper, we propose the combination of a Convolutional Neural Network (CNN) and a Long-Short Term Memory (LSTM) recurrent network for handling time series of 3D coordinates of skeleton keypoints. We have tested our proposal on six publicly available datasets.

Fig.ensp;1summarizes the proposed system, in which the input data at each time step is presented to the CNN LSTM network. The CNN is mainly responsible for capturing relevant features from the 3D data input on every time step, while the LSTM takes into account the time evolution of the 3D data series. Finally, the CNN LSTM model generates a classification result for the presented model sequence.

Fig. 1. UML Activity Diagram for the proposed system. The diagram shows two process flows, the upper one for the training process and the lower one for the testing process.

An important contribution of this paper is that the proposed network architecture does not need to be adapted to the type of activity or gesture to be recognized as well as to the geometry of the 3D time-series data as input. Nonetheless, it obtains results that are competitive to previous works that need to make assumptions on

剩余内容已隐藏,支付完成后下载完整资料


资料编号:[258002],资料为PDF文档或Word文档,PDF文档可免费转换为Word

原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付

以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。