版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Electronics and Communication Engineering College of Engineering and Technology SRM Institute of Science and Technology Tamilnadu Kattankulathur603203 India
出 版 物:《Neural Computing and Applications》 (Neural Comput. Appl.)
年 卷 期:2025年第37卷第17期
页 面:10835-10850页
核心收录:
学科分类:08[工学] 0810[工学-信息与通信工程] 070207[理学-光学] 080103[工学-流体力学] 0816[工学-测绘科学与技术] 0813[工学-建筑学] 0835[工学-软件工程] 0814[工学-土木工程] 0803[工学-光学工程] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 0801[工学-力学(可授工学、理学学位)] 0702[理学-物理学]
摘 要:Human action recognition is a vital aspect of computer vision, with applications ranging from security systems to interactive technology. Our study presents a comprehensive methodology that employs multiple feature extraction and optimization techniques to enhance the accuracy and efficiency of human action identification. The video input was divided into four distinct elements: RGB images, optical flow information, spatial saliency maps, and temporal saliency maps. Each component was analyzed independently using advanced computer vision algorithms. The process involves utilizing various algorithms and techniques to extract meaningful information from the visual data. The Farneback algorithm was employed to examine the optical flow, whereas Canny edge detection was used to assess spatial prominence. Additionally, frame comparison helps to identify motion-based prominence. These processed elements provide a comprehensive representation of both spatial and temporal information. The extracted data were then input into distinct pretrained deep learning models. Specifically, Inception V3 was used for RGB frames and optical flow analysis, ResNetV2 processed spatial saliency maps, and DenseNet-121 handled motion saliency maps. The input data are processed separately by these networks, each of which extracts specific features that are suited to their respective modalities. This feature extraction process ensures the comprehensive capture of both static and dynamic elements in video data. Subsequently, sequence modeling and classification were performed using a gated recurrent unit (GRU) that incorporated an attention mechanism. This mechanism dynamically highlights the most significant temporal segments, improving the capacity of the model to comprehend intricate human actions within video sequences. To enhance the efficiency of the model, we implemented the Grasshopper optimization algorithm to optimize the feature selection and classification stages, thus maximizing the u