检索结果-内蒙古大学图书馆

Workout Action Recognition in Video Streams Using an Attention Driven Residual DC-GRU Network

computers, Materials & Continua 2024年第5期79卷 3067-3087页

作者： Arnab Dey Samit Biswas Dac-Nhuong Le Department of Computer Science and Technology Indian Institute of Engineering Science and TechnologyShibpurHowrah711103India Faculty of Information Technology Haiphong UniversityHaiphong180000Vietnam

Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd datas

关键词： Workout action recognition video stream action recognition residual network GRU attention

来源：评论

学校读者我要写书评

暂无评论

Enhance the Performance of Directional Feature-based Palmprint Recognition by Directional Response Stability Measurement

引用

Machine Intelligence Research 2024年第3期21卷 597-614页

作者： Haitao Wang Wei Jia School of Computer Science and Information Engineering Hefei University of TechnologyHefei230009China

Palmprint recognition is an emerging biometrics technology that has attracted increasing attention in recent years. Many palmprint recognition methods have been proposed, including traditional methods and deep learning-based methods. Among the traditional methods, the methods based on directional features are mainstream because they have high recognition rates and are robust to illumination changes and small noises. However, to date, in these methods, the stability of the palmprint directional response has not been deeply studied. In this paper, we analyse the problem of directional response instability in palmprint recognition methods based on directional feature. We then propose a novel palmprint directional response stability measurement (DRSM) to judge the stability of the directional feature of each pixel. After filtering the palmprint image with the filter bank, we design DRSM according to the relationship between the maximum response value and other response values for each pixel. Using DRSM, we can judge those pixels with unstable directional response and use a specially designed encoding mode related to a specific method. We insert the DRSM mechanism into seven classical methods based on directional feature, and conduct many experiments on six public palmprint databases. The experimental results show that the DRSM mechanism can effectively improve the performance of these methods. In the field of palmprint recognition, this work is the first in-depth study on the stability of the palmprint directional response, so this paper has strong reference value for research on palmprint recognition methods based on directional features.

关键词： Biometrics palmprint recognition directional response stability directional coding-based methods directional feature

来源：评论

学校读者我要写书评

暂无评论

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding

引用

science China(information sciences) 2024年第12期67卷 65-78页

作者： Hao FENG Qi LIU Hao LIU Jingqun TANG Wengang ZHOU Houqiang LI Can HUANG Department of Electronic Engineering and Information Science University of Science and Technology of China ByteDance Inc.

In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both the perception and comprehension abilities of our DocPedia, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments are conducted on various publicly available benchmarks and the results confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.

关键词： document understanding large multimodal model OCR-free high-resolution frequency

来源：评论

学校读者我要写书评

暂无评论

GDMNet: A Unified Multi-Task Network for Panoptic Driving Perception

引用

computers, Materials & Continua 2024年第8期80卷 2963-2978页

作者： Yunxiang Liu Haili Ma Jianlin Zhu Qiangbo Zhang School of Computer Science and Information Engineering Shanghai Institute of TechnologyShanghai201418China

To enhance the efficiency and accuracy of environmental perception for autonomous vehicles,we propose GDMNet,a unified multi-task perception network for autonomous driving,capable of performing drivable area segmentation,lane detection,and traffic object ***,in the encoding stage,features are extracted,and Generalized Efficient Layer Aggregation Network(GELAN)is utilized to enhance feature extraction and gradient ***,in the decoding stage,specialized detection heads are designed;the drivable area segmentation head employs DySample to expand feature maps,the lane detection head merges early-stage features and processes the output through the Focal Modulation Network(FMN).Lastly,the Minimum Point Distance IoU(MPDIoU)loss function is employed to compute the matching degree between traffic object detection boxes and predicted boxes,facilitating model training *** results on the BDD100K dataset demonstrate that the proposed network achieves a drivable area segmentation mean intersection over union(mIoU)of 92.2%,lane detection accuracy and intersection over union(IoU)of 75.3%and 26.4%,respectively,and traffic object detection recall and mAP of 89.7%and 78.2%,*** detection performance surpasses that of other single-task or multi-task algorithm models.

关键词： Autonomous driving multitask learning drivable area segmentation lane detection vehicle detection

来源：评论

学校读者我要写书评

暂无评论

Automatic Transportation Mode Classification Using a Deep Reinforcement Learning Approach With Smartphone Sensors

引用

IEEE Access 2024年 12卷 514-533页

作者： Taherinavid, Siavash Moravvej, Seyed Vahid Chen, Yen-Lin Yang, Jing Ku, Chin Soon Yee, Por Lip Iran University of Science and Technology School of Civil Engineering Tehran13114-16846 Iran Isfahan University of Technology Department of Electrical and Computer Engineering Isfahan84156-83111 Iran National Taipei University of Technology Department of Computer Science and Information Engineering Taipei106344 Taiwan Universiti Malaya Faculty of Computer Science and Information Technology Department of Computer System and Technology Kuala Lumpur50603 Malaysia Universiti Tunku Abdul Rahman Department of Computer Science Kampar31900 Malaysia

The increasing dependence on smartphones with advanced sensors has highlighted the imperative of precise transportation mode classification, pivotal for domains like health monitoring and urban planning. This research is motivated by the pressing demand to enhance transportation mode classification, leveraging the potential of smartphone sensors, notably the accelerometer, magnetometer, and gyroscope. In response to this challenge, we present a novel automated classification model rooted in deep reinforcement learning. Our model stands out for its innovative approach of harnessing enhanced features through artificial neural networks (ANNs) and visualizing the classification task as a structured series of decision-making events. Our model adopts an improved differential evolution (DE) algorithm for initializing weights, coupled with a specialized agent-environment relationship. Every correct classification earns the agent a reward, with additional emphasis on the accurate categorization of less frequent modes through a distinct reward strategy. The Upper Confidence Bound (UCB) technique is used for action selection, promoting deep-seated knowledge, and minimizing reliance on chance. A notable innovation in our work is the introduction of a cluster-centric mutation operation within the DE algorithm. This operation strategically identifies optimal clusters in the current DE population and forges potential solutions using a pioneering update mechanism. When assessed on the extensive HTC dataset, which includes 8311 hours of data gathered from 224 participants over two years. Noteworthy results spotlight an accuracy of 0.88±0.03 and an F-measure of 0.87±0.02, underscoring the efficacy of our approach for large-scale transportation mode classification tasks. This work introduces an innovative strategy in the realm of transportation mode classification, emphasizing both precision and reliability, addressing the pressing need for enhanced classification mechanisms in an eve

关键词： Smartphones

来源：评论

学校读者我要写书评

暂无评论

Biometrics 2.0 for the Security of Smart Cities

引用

Machine Intelligence Research 2024年第6期21卷 1121-1144页

作者： Wei Jia Zhecheng Zhang Yang Zhao Hai Min Shujie Li School of Computer Science and Information Engineering Hefei University of TechnologyHefei230009China

In modern society,an increasing number of occasions need to effectively verify people's *** is the most ef-fective technology for personal *** research on automated biometrics recognition mainly started in the 1960s and *** the following 50 years,the research and application of biometrics have achieved fruitful *** 2014-2015,with the successful applications of some emerging information technologies and tools,such as deep learning,cloud computing,big data,mobile communication,smartphones,location-based services,blockchain,new sensing technology,the Internet of Things and federated learning,biometric technology entered a new development ***,taking 2014-2015 as the time boundary,the development of biometric technology can be divided into two *** addition,according to our knowledge and understanding of biometrics,we fur-ther divide the development of biometric technology into three phases,i.e.,biometrics 1.0,2.0 and *** 1.0 is the primary de-velopment phase,or the traditional development *** 2.0 is an explosive development phase due to the breakthroughs caused by some emerging information *** present,we are in the development phase of biometrics *** 3.0 is the future development phase of *** the biometrics 3.0 phase,biometric technology will be fully mature and can meet the needs of various *** 1.0 is the initial phase of the development of biometric technology,while biometrics 2.0 is the advanced *** this paper,we provide a brief review of biometrics ***,the concept of biometrics 2.0 is defined,and the architecture of biometrics 2.0 is *** particular,the application architecture of biometrics 2.0 in smart cities is *** challenges and perspectives of biometrics 2.0 are also discussed.

关键词： Biometrics smart city security application architecture

来源：评论

学校读者我要写书评

暂无评论

Deploying Hybrid Ensemble Machine Learning Techniques for Effective Cross-Site Scripting(XSS)Attack Detection

引用

computers, Materials & Continua 2024年第10期81卷 707-748页

作者： Noor Ullah Bacha Songfeng Lu Attiq Ur Rehman Muhammad Idrees Yazeed Yasin Ghadi Tahani Jaser Alahmadi School of Cyber Science and Engineering Huazhong University of Science and TechnologyWuhan430073China Department of Computer Science and Engineering University of Engineering and TechnologyLahore54000Pakistan Department of Computer Science and Software Engineering Al Ain UniversityAl Ain12555Abu Dhabi Department of Information Systems College of Computer and Information SciencesPrincess Nourah bint Abdulrahman UniversityRiyadh84428Saudi Arabia

Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive *** detection methods often fail to keep pace with the evolving sophistication of cyber *** paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack *** approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in *** demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant ***,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time *** for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current *** innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.

关键词： Cross-site scripting machine learning XSS detection stacking ensemble learning hybrid learning

来源：评论

学校读者我要写书评

暂无评论

CVTD: A Robust Car-Mounted Video Text Detector

引用

computers, Materials & Continua 2024年第2期78卷 1821-1842页

作者： Di Zhou Jianxun Zhang Chao Li Yifan Guo Bowen Li Department of Computer Science and Engineering Chongqing University of TechnologyChongqingChina College of Information and Engineering Jingdezhen Ceramic UniversityJingdezhenChina

Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous *** information in car-mounted videos can assist drivers in making ***,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time *** proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary *** model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD *** enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text ***,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s *** further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection *** model holds potential for practical applications in real-world scenarios.

关键词： Deep learning text detection Car-mounted video text detector intelligent driving assistance arbitrary shape text detector

来源：评论

学校读者我要写书评

暂无评论

A learning automata based edge resource allocation approach for IoT-enabled smart cities

引用

Digital Communications and Networks 2024年第5期10卷 1258-1266页

作者： Sampa Sahoo Kshira Sagar Sahoo Bibhudatta Sahoo Amir H.Gandomi Department of Computer Science and Engineering C.V.Raman Global UniversityBhubaneswarIndia Department of Computer Science and Engineering SRM UniversityAmaravatiIndia Department of Computing Science Umea UniversityUmea90187Sweden Department of Computer Science and Engineering NIT RourkelaIndia Faculty of Engineering and Information Technology University of Technology SydneyAustralia University Research and Innovation Center(EKIK) Obuda University1034 BudapestHungary

The development of the Internet of Things(IoT)technology is leading to a new era of smart applications such as smart transportation,buildings,and smart ***,these applications act as the building blocks of IoT-enabled smart *** high volume and high velocity of data generated by various smart city applications are sent to flexible and efficient cloud computing resources for ***,there is a high computation latency due to the presence of a remote cloud *** computing,which brings the computation close to the data source is introduced to overcome this *** an IoT-enabled smart city environment,one of the main concerns is to consume the least amount of energy while executing tasks that satisfy the delay *** efficient resource allocation at the edge is helpful to address this *** this paper,an energy and delay minimization problem in a smart city environment is formulated as a bi-objective edge resource allocation ***,we presented a three-layer network architecture for IoT-enabled smart ***,we designed a learning automata-based edge resource allocation approach considering the three-layer network architecture to solve the said bi-objective minimization *** Automata(LA)is a reinforcement-based adaptive decision-maker that helps to find the best task and edge resource *** extensive set of simulations is performed to demonstrate the applicability and effectiveness of the LA-based approach in the IoT-enabled smart city environment.

关键词： Edge computing IoT Learning automata Resource allocation Smart city

来源：评论

学校读者我要写书评

暂无评论

Early detection of stroke disease using patients previous medical data instil with deep learning

引用

Multimedia Tools and Applications 2025年第16期84卷 16853-16881页

作者： Diwan, Tausif Gajbhiye, Saurav M. Goydani, Purva R. Gannarpwar, Vedant R. Khandait, Harshal R. Tembhurne, Jitendra V. Sahare, Parul Department of Computer Science and Engineering Indian Institute of Information Technology Maharashtra Nagpur441108 India Department of Electronics and Communication Engineering Indian Institute of Information Technology Maharashtra Nagpur441108 India

Early detection of any disease and starting its treatment in this early stage are the most important steps in case of any life-threatening disease. Stroke is not an exception in this regard which is one of the leading causes of death and disability worldwide. We develop a simple but efficient deep neural network for the stroke prediction that accurately evaluates the probability of occurrence of stroke disease by treating this as a binary classification problem on one of the standard datasets named as Stroke Prediction Dataset available on Kaggle. With the help of effective pre-processing techniques such as SMOTE and FastICA on the noisy and imbalanced dataset, we could achieve improved performance for the binary classification of stroke prediction by employing a deep dense neural network. The architectural hyper-parameters are critically designed with the help of Keras Tuner and consisting of just ten layers in the deep dense neural network. With the help of this light weight deep dense model, we report an accuracy of 95%, AUC Score of 100% and F1-Score of 95% on a testing set. Moreover, we also present comparative illustration of various machine learning models for the aforesaid task along with the comparative illustration of various state of the arts with our proposed model for the stroke prediction. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：