Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i...
详细信息
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd datas
Palmprint recognition is an emerging biometrics technology that has attracted increasing attention in recent years. Many palmprint recognition methods have been proposed, including traditional methods and deep learnin...
详细信息
Palmprint recognition is an emerging biometrics technology that has attracted increasing attention in recent years. Many palmprint recognition methods have been proposed, including traditional methods and deep learning-based methods. Among the traditional methods, the methods based on directional features are mainstream because they have high recognition rates and are robust to illumination changes and small noises. However, to date, in these methods, the stability of the palmprint directional response has not been deeply studied. In this paper, we analyse the problem of directional response instability in palmprint recognition methods based on directional feature. We then propose a novel palmprint directional response stability measurement (DRSM) to judge the stability of the directional feature of each pixel. After filtering the palmprint image with the filter bank, we design DRSM according to the relationship between the maximum response value and other response values for each pixel. Using DRSM, we can judge those pixels with unstable directional response and use a specially designed encoding mode related to a specific method. We insert the DRSM mechanism into seven classical methods based on directional feature, and conduct many experiments on six public palmprint databases. The experimental results show that the DRSM mechanism can effectively improve the performance of these methods. In the field of palmprint recognition, this work is the first in-depth study on the stability of the palmprint directional response, so this paper has strong reference value for research on palmprint recognition methods based on directional features.
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either str...
详细信息
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both the perception and comprehension abilities of our DocPedia, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments are conducted on various publicly available benchmarks and the results confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.
To enhance the efficiency and accuracy of environmental perception for autonomous vehicles,we propose GDMNet,a unified multi-task perception network for autonomous driving,capable of performing drivable area segmentat...
详细信息
To enhance the efficiency and accuracy of environmental perception for autonomous vehicles,we propose GDMNet,a unified multi-task perception network for autonomous driving,capable of performing drivable area segmentation,lane detection,and traffic object ***,in the encoding stage,features are extracted,and Generalized Efficient Layer Aggregation Network(GELAN)is utilized to enhance feature extraction and gradient ***,in the decoding stage,specialized detection heads are designed;the drivable area segmentation head employs DySample to expand feature maps,the lane detection head merges early-stage features and processes the output through the Focal Modulation Network(FMN).Lastly,the Minimum Point Distance IoU(MPDIoU)loss function is employed to compute the matching degree between traffic object detection boxes and predicted boxes,facilitating model training *** results on the BDD100K dataset demonstrate that the proposed network achieves a drivable area segmentation mean intersection over union(mIoU)of 92.2%,lane detection accuracy and intersection over union(IoU)of 75.3%and 26.4%,respectively,and traffic object detection recall and mAP of 89.7%and 78.2%,*** detection performance surpasses that of other single-task or multi-task algorithm models.
The increasing dependence on smartphones with advanced sensors has highlighted the imperative of precise transportation mode classification, pivotal for domains like health monitoring and urban planning. This research...
详细信息
The increasing dependence on smartphones with advanced sensors has highlighted the imperative of precise transportation mode classification, pivotal for domains like health monitoring and urban planning. This research is motivated by the pressing demand to enhance transportation mode classification, leveraging the potential of smartphone sensors, notably the accelerometer, magnetometer, and gyroscope. In response to this challenge, we present a novel automated classification model rooted in deep reinforcement learning. Our model stands out for its innovative approach of harnessing enhanced features through artificial neural networks (ANNs) and visualizing the classification task as a structured series of decision-making events. Our model adopts an improved differential evolution (DE) algorithm for initializing weights, coupled with a specialized agent-environment relationship. Every correct classification earns the agent a reward, with additional emphasis on the accurate categorization of less frequent modes through a distinct reward strategy. The Upper Confidence Bound (UCB) technique is used for action selection, promoting deep-seated knowledge, and minimizing reliance on chance. A notable innovation in our work is the introduction of a cluster-centric mutation operation within the DE algorithm. This operation strategically identifies optimal clusters in the current DE population and forges potential solutions using a pioneering update mechanism. When assessed on the extensive HTC dataset, which includes 8311 hours of data gathered from 224 participants over two years. Noteworthy results spotlight an accuracy of 0.88±0.03 and an F-measure of 0.87±0.02, underscoring the efficacy of our approach for large-scale transportation mode classification tasks. This work introduces an innovative strategy in the realm of transportation mode classification, emphasizing both precision and reliability, addressing the pressing need for enhanced classification mechanisms in an eve
In modern society,an increasing number of occasions need to effectively verify people's *** is the most ef-fective technology for personal *** research on automated biometrics recognition mainly started in the 196...
详细信息
In modern society,an increasing number of occasions need to effectively verify people's *** is the most ef-fective technology for personal *** research on automated biometrics recognition mainly started in the 1960s and *** the following 50 years,the research and application of biometrics have achieved fruitful *** 2014-2015,with the successful applications of some emerging information technologies and tools,such as deep learning,cloud computing,big data,mobile communication,smartphones,location-based services,blockchain,new sensing technology,the Internet of Things and federated learning,biometric technology entered a new development ***,taking 2014-2015 as the time boundary,the development of biometric technology can be divided into two *** addition,according to our knowledge and understanding of biometrics,we fur-ther divide the development of biometric technology into three phases,i.e.,biometrics 1.0,2.0 and *** 1.0 is the primary de-velopment phase,or the traditional development *** 2.0 is an explosive development phase due to the breakthroughs caused by some emerging information *** present,we are in the development phase of biometrics *** 3.0 is the future development phase of *** the biometrics 3.0 phase,biometric technology will be fully mature and can meet the needs of various *** 1.0 is the initial phase of the development of biometric technology,while biometrics 2.0 is the advanced *** this paper,we provide a brief review of biometrics ***,the concept of biometrics 2.0 is defined,and the architecture of biometrics 2.0 is *** particular,the application architecture of biometrics 2.0 in smart cities is *** challenges and perspectives of biometrics 2.0 are also discussed.
Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive *** detection methods often fail to keep pace with the evolving ...
详细信息
Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive *** detection methods often fail to keep pace with the evolving sophistication of cyber *** paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack *** approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in *** demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant ***,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time *** for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current *** innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous *** information in car-mounted videos can a...
详细信息
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous *** information in car-mounted videos can assist drivers in making ***,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time *** proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary *** model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD *** enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text ***,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s *** further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection *** model holds potential for practical applications in real-world scenarios.
The development of the Internet of Things(IoT)technology is leading to a new era of smart applications such as smart transportation,buildings,and smart ***,these applications act as the building blocks of IoT-enabled ...
详细信息
The development of the Internet of Things(IoT)technology is leading to a new era of smart applications such as smart transportation,buildings,and smart ***,these applications act as the building blocks of IoT-enabled smart *** high volume and high velocity of data generated by various smart city applications are sent to flexible and efficient cloud computing resources for ***,there is a high computation latency due to the presence of a remote cloud *** computing,which brings the computation close to the data source is introduced to overcome this *** an IoT-enabled smart city environment,one of the main concerns is to consume the least amount of energy while executing tasks that satisfy the delay *** efficient resource allocation at the edge is helpful to address this *** this paper,an energy and delay minimization problem in a smart city environment is formulated as a bi-objective edge resource allocation ***,we presented a three-layer network architecture for IoT-enabled smart ***,we designed a learning automata-based edge resource allocation approach considering the three-layer network architecture to solve the said bi-objective minimization *** Automata(LA)is a reinforcement-based adaptive decision-maker that helps to find the best task and edge resource *** extensive set of simulations is performed to demonstrate the applicability and effectiveness of the LA-based approach in the IoT-enabled smart city environment.
Early detection of any disease and starting its treatment in this early stage are the most important steps in case of any life-threatening disease. Stroke is not an exception in this regard which is one of the leading...
详细信息
暂无评论