Sign languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-pres...
详细信息
ISBN:
(纸本)9781665448994
Sign languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal representations generalize over an individual's appearance and background, allowing us to focus on the recognition of motion. But how much information is lost by the skeletal representation? We perform two independent studies using two state-of-the-art pose estimation systems. We analyze the applicability of the pose estimation systems to sign language recognition by evaluating the failure cases of the recognition models. Importantly, this allows us to characterize the current limitations of skeletal pose estimation approaches in sign language recognition.
The existence of pests and diseases threatens the agricultural production and brings serious economic losses to countries whose economy is based on agriculture. Traditional agricultural production methods require a lo...
详细信息
ISBN:
(纸本)9798350362770;9798350362763
The existence of pests and diseases threatens the agricultural production and brings serious economic losses to countries whose economy is based on agriculture. Traditional agricultural production methods require a lot of manpower to identify and detect pests and diseases, which will increase the cost of agricultural production. Now, with the development of deep learning technology, computervision based approaches can help reduce costs and improve efficiency. Through the deep learning model based on CNN, insects and pests can be accurately detected and recognized. This paper proposes an improved recognition model, MFnet, based on the lightweight deep neural network MobileNetV3. A new activation function, MLU6, is introduced in the proposed MFnet model. At the same time, we use the PolyLoss function to replace the cross-entropy loss function when training the proposed model. We also use transfer learning to improve the efficiency of network optimization and prevent overfitting through data augmentation and regularization. Ablation and comparative experiments are carried out and the IP102 pest dataset is used in the experiments. The experimental result shows that MFnet outperforms the original Mobilenetv3 and the other classic models in terms of classification accuracy, at the cost of increased number of parameters compared to the original Mobilenetv3. However, the parameter number, training time and classification time of the proposed model are still much less than those of the other classic models.
Aiming at the problem that it is difficult to fully and effectively utilize features for complete representation of target information during target tracking in complex scenes, this paper proposes an ECO-HC target tra...
详细信息
Text recognition in natural images remains a challenging yet essential task, with broad applications spanning computervision and natural language processing. This paper introduces a novel end-to-end framework that co...
详细信息
Human Activity recognition (HAR) interprets data from healthcare, security surveillance, human-computer interaction (HCI), and sensors to recognize and categorize human actions. The rapid development of artificial int...
详细信息
Nowadays, camera traps are widely employed in monitoring biodiversity and assessing the population density of animal species. A challenge in animal recognition in camera trap images is the detection of small animals i...
详细信息
ISBN:
(纸本)9798331539856
Nowadays, camera traps are widely employed in monitoring biodiversity and assessing the population density of animal species. A challenge in animal recognition in camera trap images is the detection of small animals in complex environments and the identification of heavily obscured animals. This paper presents two novel methods that leverage sequentially captured images to improve animal recognition accuracy: one utilizing optical flow information and the other a motion-based algorithm based on the principle of median filtering. In experiments, we used two new real-world sequence-based camera trap image datasets to evaluate these methods. Our findings indicate that optical flow information effectively reduces false positive cases, while the motion-based algorithm significantly improves the accuracy of detecting animal presence and counting by substantially reducing false negative cases. Specifically, using the MegaDetector with a confidence threshold of 0.5 as the baseline, the motion-based method reduced false negative cases by over 70% while only slightly increasing false positive cases, and improved animal counting accuracy by more than 25%.
Image-based insect species identification is a comprehensive application of computervision technology, image processing technology and patternrecognition technology to realize insect species identification. It is of...
详细信息
While machine learning powers impressive computervision systems, they lack the human advantage of general world knowledge. This means they struggle to interpret visual data with humans' same richness of understan...
详细信息
The current rate of decline in biodiversity exclaims ecological conservation. In response, camera traps are being increasingly deployed for the perlustration of wildlife. The analyses of camera trap data can aid in cu...
详细信息
ISBN:
(纸本)9783031245374;9783031245381
The current rate of decline in biodiversity exclaims ecological conservation. In response, camera traps are being increasingly deployed for the perlustration of wildlife. The analyses of camera trap data can aid in curbing species extinction. However, a substantial amount of time is lost in the manual review curtailing the usage of camera traps for prompt decision-making. The insuperable visual challenges and proneness of camera trap to record empty frames (frames that are natural backdrops with no wildlife presence) deem wildlife detection and species recognition a demanding and taxing task. Thus, we propose a pipeline for wildlife detection and species recognition to expedite the processing of camera trap sequences. The proposed pipeline consists of three stages: (i) empty frame removal, (ii) wildlife detection, and (iii) species recognition and classification. We leverage vision transformer (ViT), DEtection TRansformer (DETR), vision and detection transformer (ViDT), faster region based convolutional neural network (Faster R-CNN), inception v3, and ResNet 50 for the same. We examine the adroitness of the leveraged algorithms at new and unseen locations against the challenges of domain generalisation. We demonstrate the effectiveness of the proposed pipeline using the Caltech camera trap (CCT) dataset.
Learning low-dimensional latent state space dynamics models has proven powerful for enabling vision-based planning and learning for control. We introduce a latent dynamics learning framework that is uniquely designed ...
详细信息
ISBN:
(纸本)9781665445092
Learning low-dimensional latent state space dynamics models has proven powerful for enabling vision-based planning and learning for control. We introduce a latent dynamics learning framework that is uniquely designed to induce proportional controlability in the latent space, thus enabling the use of simple and well-known PID controllers. We show that our learned dynamics model enables proportional control from pixels, dramatically simplifies and accelerates behavioural cloning of vision-based controllers, and provides interpretable goal discovery when applied to imitation learning of switching controllers from demonstration. Notably, such proportional controlability also allows for robust path following from visual demonstrations using Dynamic Movement Primitives in the learned latent space.
暂无评论