Preservation of the crops depends on early and accurate detection of pests on crops as they cause several diseases decreasing crop production and quality. Several deep-learning techniques have been applied to overcome...
详细信息
Preservation of the crops depends on early and accurate detection of pests on crops as they cause several diseases decreasing crop production and quality. Several deep-learning techniques have been applied to overcome the issue of pest detection on crops. We have developed the YOLOCSP-PEST model for Pest localization and classification. With the Cross Stage Partial Network (CSPNET) backbone, the proposed model is a modified version of You Only Look Once Version 7 (YOLOv7) that is intended primarily for pest localization and classification. Our proposed model gives exceptionally good results under conditions that are very challenging for any other comparable models especially conditions where we have issues with the luminance and the orientation of the images. It helps farmers working out on their crops in distant areas to determine any infestation quickly and accurately on their crops which helps in the quality and quantity of the production yield. The model has been trained and tested on 2 datasets namely the IP102 data set and a local crop data set on both of which it has shown exceptional results. It gave us a mean average precision (mAP) of 88.40% along with a precision of 85.55% and a recall of 84.25% on the IP102 dataset meanwhile giving a mAP of 97.18% on the local data set along with a recall of 94.88% and a precision of 97.50%. These findings demonstrate that the proposed model is very effective in detecting real-life scenarios and can help in the production of crops improving the yield quality and quantity at the same time.
Web Navigation Prediction (WNP) has been popularly used for finding future probable web pages. Obtaining relevant information from a large web is challenging, as its size is growing with every second. Web data may con...
详细信息
Facial expression recognition is a challenging task when neural network is applied to pattern recognition. Most of the current recognition research is based on single source facial data, which generally has the disadv...
详细信息
Efficient botnet detection is of great security importance and has been the focus of researchers in recent years. Botnet detection is also a difficult task due to the difficulty in distinguishing it from normal traffi...
详细信息
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i...
详细信息
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd datas
Nowadays, social media applications and websites have become a crucial part of people’s lives;for sharing their moments, contacting their families and friends, or even for their jobs. However, the fact that these val...
详细信息
Palmprint recognition is an emerging biometrics technology that has attracted increasing attention in recent years. Many palmprint recognition methods have been proposed, including traditional methods and deep learnin...
详细信息
Palmprint recognition is an emerging biometrics technology that has attracted increasing attention in recent years. Many palmprint recognition methods have been proposed, including traditional methods and deep learning-based methods. Among the traditional methods, the methods based on directional features are mainstream because they have high recognition rates and are robust to illumination changes and small noises. However, to date, in these methods, the stability of the palmprint directional response has not been deeply studied. In this paper, we analyse the problem of directional response instability in palmprint recognition methods based on directional feature. We then propose a novel palmprint directional response stability measurement (DRSM) to judge the stability of the directional feature of each pixel. After filtering the palmprint image with the filter bank, we design DRSM according to the relationship between the maximum response value and other response values for each pixel. Using DRSM, we can judge those pixels with unstable directional response and use a specially designed encoding mode related to a specific method. We insert the DRSM mechanism into seven classical methods based on directional feature, and conduct many experiments on six public palmprint databases. The experimental results show that the DRSM mechanism can effectively improve the performance of these methods. In the field of palmprint recognition, this work is the first in-depth study on the stability of the palmprint directional response, so this paper has strong reference value for research on palmprint recognition methods based on directional features.
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either str...
详细信息
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both the perception and comprehension abilities of our DocPedia, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments are conducted on various publicly available benchmarks and the results confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.
To enhance the efficiency and accuracy of environmental perception for autonomous vehicles,we propose GDMNet,a unified multi-task perception network for autonomous driving,capable of performing drivable area segmentat...
详细信息
To enhance the efficiency and accuracy of environmental perception for autonomous vehicles,we propose GDMNet,a unified multi-task perception network for autonomous driving,capable of performing drivable area segmentation,lane detection,and traffic object ***,in the encoding stage,features are extracted,and Generalized Efficient Layer Aggregation Network(GELAN)is utilized to enhance feature extraction and gradient ***,in the decoding stage,specialized detection heads are designed;the drivable area segmentation head employs DySample to expand feature maps,the lane detection head merges early-stage features and processes the output through the Focal Modulation Network(FMN).Lastly,the Minimum Point Distance IoU(MPDIoU)loss function is employed to compute the matching degree between traffic object detection boxes and predicted boxes,facilitating model training *** results on the BDD100K dataset demonstrate that the proposed network achieves a drivable area segmentation mean intersection over union(mIoU)of 92.2%,lane detection accuracy and intersection over union(IoU)of 75.3%and 26.4%,respectively,and traffic object detection recall and mAP of 89.7%and 78.2%,*** detection performance surpasses that of other single-task or multi-task algorithm models.
The increasing dependence on smartphones with advanced sensors has highlighted the imperative of precise transportation mode classification, pivotal for domains like health monitoring and urban planning. This research...
详细信息
The increasing dependence on smartphones with advanced sensors has highlighted the imperative of precise transportation mode classification, pivotal for domains like health monitoring and urban planning. This research is motivated by the pressing demand to enhance transportation mode classification, leveraging the potential of smartphone sensors, notably the accelerometer, magnetometer, and gyroscope. In response to this challenge, we present a novel automated classification model rooted in deep reinforcement learning. Our model stands out for its innovative approach of harnessing enhanced features through artificial neural networks (ANNs) and visualizing the classification task as a structured series of decision-making events. Our model adopts an improved differential evolution (DE) algorithm for initializing weights, coupled with a specialized agent-environment relationship. Every correct classification earns the agent a reward, with additional emphasis on the accurate categorization of less frequent modes through a distinct reward strategy. The Upper Confidence Bound (UCB) technique is used for action selection, promoting deep-seated knowledge, and minimizing reliance on chance. A notable innovation in our work is the introduction of a cluster-centric mutation operation within the DE algorithm. This operation strategically identifies optimal clusters in the current DE population and forges potential solutions using a pioneering update mechanism. When assessed on the extensive HTC dataset, which includes 8311 hours of data gathered from 224 participants over two years. Noteworthy results spotlight an accuracy of 0.88±0.03 and an F-measure of 0.87±0.02, underscoring the efficacy of our approach for large-scale transportation mode classification tasks. This work introduces an innovative strategy in the realm of transportation mode classification, emphasizing both precision and reliability, addressing the pressing need for enhanced classification mechanisms in an eve
暂无评论