Thanks to the emergence and continued devel-opment of machine learning, particularly deep learning, the research on visual question and answer, also known as VQA, has advanced dramatically, with great theoretical rese...
详细信息
This study presents an innovative approach to animal classification and recognition utilizing machine learning and deep learning methodologies. Leveraging advanced algorithms, the proposed system achieves remarkable a...
详细信息
Automatic detection of pineapples in complex agricultural environments poses several challenges. During harvesting, pineapples that are suitable for collection exhibit intricate scaly surface textures and a wide range...
详细信息
Automatic detection of pineapples in complex agricultural environments poses several challenges. During harvesting, pineapples that are suitable for collection exhibit intricate scaly surface textures and a wide range of colors. Moreover, occlusion by leaves and fluctuating lighting conditions further complicate the detection of pineapples. In this paper, we propose a high-precision lightweight detection network based on the improved You Only Look Once version 7-tiny (Pineapple-YOLO) for the robot vision system to realize realtime and accurate detection of pineapple. The Convolutional Block Attention Module (CBAM) is embedded into the backbone network to enhance the feature extraction capability, and the Content-Aware Reassembly of Features (CARAFE) is introduced to perform up-sampling operations and expand the receptive field. The Scylla Intersection over Union (SIoU) loss function is used instead of the Complete Intersection over Union (CIoU) loss function to consider the vector angles and redefine the penalty criteria. Finally, the K-means++ clustering algorithm is used to re-cluster the labels of the pineapple dataset and update the size of the anchor. The experimental results show that Pineapple-YOLO achieves a mAP@0.5 of 89.7%, which is a 6.15% improvement over the original YOLOv7-tiny, demonstrating its superiority over other mainstream target detection models. Furthermore, in diverse natural environments where the agricultural robot operates, the Pineapple-YOLO algorithm sustains a commendable 92% success rate in fruit picking, achieved within an average time of 12 s. This demonstrates the efficiency of the visual module in practical engineering applications.
image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compressio...
详细信息
image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.
object detection based on event vision has been a dynamically growing field in computer vision for the last 16 years. In this work, we create multiple channels from a single event camera and propose an event fusion me...
详细信息
ISBN:
(数字)9798331506520
ISBN:
(纸本)9798331506537
object detection based on event vision has been a dynamically growing field in computer vision for the last 16 years. In this work, we create multiple channels from a single event camera and propose an event fusion method (EFM) to enhance object detection in event-based vision systems. Each channel uses a different accumulation buffer to collect events from the event camera. We implement YOLOv7 for object detection, followed by a fusion algorithm. Our multichannel approach outperforms single-channel-based object detection by 0.7% in mean Average Precision (mAP) for detection overlapping ground truth with IOU = 0.5.
Deep learning advancements have significantly enhanced computer visionapplications in precision agriculture. While RGB cameras operating in visible light are affordable, they provide limited information compared to m...
详细信息
Insect image recognition (IIR) is a specified field in machine learning (ML) and computer vision that efforts to automatically recognise and detection of insect species utilizing visual data attained from images. Leve...
详细信息
Cataracts are clouding of the lens in the eye, leading to loss of vision that can progress to blindness if not treated. This paper proposed a new method for automatic cataract detection using color fundus images and d...
详细信息
With the increased use of closed-circuit television (CCTV) footage for security and surveillance purposes as well as for object or person recognition and efficiency monitoring, high-quality CCTV videos are necessary. ...
详细信息
ISBN:
(纸本)9781450395670
With the increased use of closed-circuit television (CCTV) footage for security and surveillance purposes as well as for object or person recognition and efficiency monitoring, high-quality CCTV videos are necessary. In this paper, we propose Corgi Eye, a moving object removal + super-resolution framework for enhancing CCTV footages to remove ghosting artifacts caused by performing multiframe super-resolution (MISR) on moving objects. Our method extends the framework of Eagle Eye, which is an existing MISR framework tailored for mobile devices. Our results demonstrate that the system can completely remove ghosting effects caused by moving objects while performing MISR on CCTV footage. Our proposed method demonstrates competitive performance when compared to Eagle Eye, achieving a 16% increase in terms of PSNR metric. Additionally, our method can produce clear images, on par with deep learning approaches such as ESPCN and SOF-VSR.
暂无评论