版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Cent S Univ Sch Informat Sci & Engn Changsha Hunan Peoples R China Cent S Univ Xiangya Hosp China Mobile Joint Lab Mobile Hlth Minist Educ Changsha Hunan Peoples R China
出 版 物:《JOURNAL OF ENGINEERING-JOE》
年 卷 期:2018年第2018卷第16期
页 面:1515-1520页
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 08[工学]
主 题:human computer interaction feature extraction gesture recognition learning (artificial intelligence) neural nets computer vision image segmentation image colour analysis convolutional neural network computer vision depth camera RGB-D camera depth information robust gesture recognition system RGB-D static gesture recognition method gesture segmentation feature extraction depth images American Sign Language Recognition dataset RGB input traditional machine learning methods ASL recognition dataset CNN algorithms RGB input only method fine-tuning Inception V3 CNN structure
摘 要:In the area of human-computer interaction (HCI) and computer vision, gesture recognition has always been a research hotspot. With the appearance of depth camera, gesture recognition using RGB-D camera has gradually become mainstream in this field. However, how to effectively use depth information to construct a robust gesture recognition system is still a problem. In this paper, an RGB-D static gesture recognition method based on fine-tuning Inception V3 is proposed, which can eliminate the steps of gesture segmentation and feature extraction in traditional algorithms. Compared with general CNN algorithms, the authors adopt a two-stage training strategy to fine-tune the model. This method sets a feature concatenate layer of RGB and depth images in the CNN structure, using depth information to promote the performance of gesture recognition. Finally, on the American Sign Language (ASL) Recognition dataset, the authors compared their method with other traditional machine learning methods, CNN algorithms, and the RGB input only method. Among three groups of comparative experiments, the authors method reached the highest accuracy of 91.35%, reaching the state-of-the-art currently on ASL dataset.