The 1-ms visual feedback system is critical for seamless actuation in robotics, as any delay affects its performance in handling dynamic situations. Specular reflections cause problems in many visual technologies, mak...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
The 1-ms visual feedback system is critical for seamless actuation in robotics, as any delay affects its performance in handling dynamic situations. Specular reflections cause problems in many visual technologies, making specular detection crucial in 1-ms visual feedback systems. However, existing real-time methods, which target Neumann architecture, fail to achieve the 1-ms delay due to spatial memory paths resulting from extensive frame-based processing. This research aims to develop a 1-ms specular detection system from both algorithm and architecture perspectives, proposing 1) temporal clustering and temporal reference based specular detection method, which leverages temporal domain information to address the requirements of frame-based processing;and 2) global-local integrated specular detection architecture, which enables the coexistence of local and global processing within a 1-ms stream-based architecture. The proposed methods are implemented on FPGA. The evaluation shows that the proposed system supports sensing and processing a 1000-fps sequence with a delay of 0.941 ms/frame.
Autonomous vehicles require real-timeimageprocessing to improve their capabilities by allowing them to understand and respond appropriately to their environment. This paper examines the present state of real-time im...
详细信息
In this paper, we propose an improved model of Shallow-UWnet for underwater image enhancement. In the proposed method, we enhance the learning process and solve the vanishing gradient problem by a skip connection, whi...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
In this paper, we propose an improved model of Shallow-UWnet for underwater image enhancement. In the proposed method, we enhance the learning process and solve the vanishing gradient problem by a skip connection, which concatenates the raw underwater image and the impulse response of low-pass filter (LPF) into Shallow-UWnet. Additionally, we integrate the simple, parameter-free attention module (SimAM) into each Convolution Block to enhance the visual quality of images. Performance evaluations with state-of-the-art methods show that the proposed method has comparable results on EUVP-Dark, UFO-120, and UIEB datasets. Moreover, the proposed model has fewer trainable parameters and the resulting faster testing time is suitable for real-timeprocessing in underwater image enhancement, which is particularly for resource-constrained underwater robots.
With the rise of the Internet of Things (IoT) and edge computing technologies, traditional cloud-dependent convolutional neural network (CNN) imageprocessing methods are facing the challenges of latency and bandwidth...
详细信息
The detection of road potholes plays a crucial role in ensuring passenger comfort and the structural safety of vehicles. To address the challenges of pothole detection in complex road environments, this paper proposes...
详细信息
The detection of road potholes plays a crucial role in ensuring passenger comfort and the structural safety of vehicles. To address the challenges of pothole detection in complex road environments, this paper proposes a model focusing on shape features (pothole detection you only look once, PD-YOLO). The model aims to overcome the limitations of multi-scale feature learning caused by the use of fixed convolutional kernels in the baseline model, by constructing a feature extraction module that better adapts to variations in the shape of potholes. Subsequently, a cross-stage partial network was designed using a one-time aggregation method, simplifying the model while enabling the network to fuse information between feature maps at different stages. Additionally, a dynamic sparse attention mechanism is introduced to select relevant features, reducing redundancy and suppressing background noise. Experiments conducted on the VOC2007 and GRDDC2020_Pothole datasets reveal that compared to the baseline model YOLOv8, PD-YOLO achieves improvements of 3.9% and 2.8% in mean average precision, with a frame rate of approximately 290 frames per second, effectively meeting the accuracy and real-time requirements for pothole detection. The code and dataset for this paper are located at: .
The proceedings contain 134 papers. The special focus in this conference is on Computer Vision and imageprocessing. The topics include: Age and Gender Prediction Using Deep CNNs and Transfer Learning;Text Line Segmen...
ISBN:
(纸本)9789811611025
The proceedings contain 134 papers. The special focus in this conference is on Computer Vision and imageprocessing. The topics include: Age and Gender Prediction Using Deep CNNs and Transfer Learning;Text Line Segmentation: A FCN Based Approach;precise Recognition of Vision Based Multi-hand Signs Using Deep Single Stage Convolutional Neural Network;human Gait Abnormality Detection Using Low Cost Sensor Technology;Bengali Place Name Recognition - Comparative Analysis Using Different CNN Architectures;action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features;face Verification Using Single Sample in Adolescence;evaluation of Deep Learning Networks for Keratoconus Detection Using Corneal Topographic images;deep Facial Emotion Recognition System Under Facial Mask Occlusion;domain Adaptation Based Technique for image Emotion Recognition Using image Captions;gesture Recognition in Sign Language Videos by Tracking the Position and Medial Representation of the Hand Shapes;deepDoT: Deep Framework for Detection of Tables in Document images;Correcting Low Illumination images Using PSO-Based Gamma Correction and image Classifying Method;DeblurRL: image Deblurring with Deep Reinforcement Learning;FGrade: A Large Volume Dataset for Grading Tomato Freshness Quality;enhancement of Region of Interest from a Single Backlit image with Multiple Features;human Action Recognition from 3D Landmark Points of the Performer;real-time Sign Language Interpreter on Embedded Platform;complex Gradient Function Based Descriptor for Iris Biometrics and Action Recognition;on-Device Language Identification of Text in images Using Diacritic Characters;a Pre-processing Assisted Neural Network for Dynamic Bad Pixel Detection in Bayer images;preface;dynamic User Interface Composition.
The proceedings contain 134 papers. The special focus in this conference is on Computer Vision and imageprocessing. The topics include: Age and Gender Prediction Using Deep CNNs and Transfer Learning;Text Line Segmen...
ISBN:
(纸本)9789811610851
The proceedings contain 134 papers. The special focus in this conference is on Computer Vision and imageprocessing. The topics include: Age and Gender Prediction Using Deep CNNs and Transfer Learning;Text Line Segmentation: A FCN Based Approach;precise Recognition of Vision Based Multi-hand Signs Using Deep Single Stage Convolutional Neural Network;human Gait Abnormality Detection Using Low Cost Sensor Technology;Bengali Place Name Recognition - Comparative Analysis Using Different CNN Architectures;action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features;face Verification Using Single Sample in Adolescence;evaluation of Deep Learning Networks for Keratoconus Detection Using Corneal Topographic images;deep Facial Emotion Recognition System Under Facial Mask Occlusion;domain Adaptation Based Technique for image Emotion Recognition Using image Captions;gesture Recognition in Sign Language Videos by Tracking the Position and Medial Representation of the Hand Shapes;deepDoT: Deep Framework for Detection of Tables in Document images;Correcting Low Illumination images Using PSO-Based Gamma Correction and image Classifying Method;DeblurRL: image Deblurring with Deep Reinforcement Learning;FGrade: A Large Volume Dataset for Grading Tomato Freshness Quality;enhancement of Region of Interest from a Single Backlit image with Multiple Features;human Action Recognition from 3D Landmark Points of the Performer;real-time Sign Language Interpreter on Embedded Platform;complex Gradient Function Based Descriptor for Iris Biometrics and Action Recognition;on-Device Language Identification of Text in images Using Diacritic Characters;a Pre-processing Assisted Neural Network for Dynamic Bad Pixel Detection in Bayer images;preface;dynamic User Interface Composition.
The proceedings contain 134 papers. The special focus in this conference is on Computer Vision and imageprocessing. The topics include: Age and Gender Prediction Using Deep CNNs and Transfer Learning;Text Line Segmen...
ISBN:
(纸本)9789811610912
The proceedings contain 134 papers. The special focus in this conference is on Computer Vision and imageprocessing. The topics include: Age and Gender Prediction Using Deep CNNs and Transfer Learning;Text Line Segmentation: A FCN Based Approach;precise Recognition of Vision Based Multi-hand Signs Using Deep Single Stage Convolutional Neural Network;human Gait Abnormality Detection Using Low Cost Sensor Technology;Bengali Place Name Recognition - Comparative Analysis Using Different CNN Architectures;action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features;face Verification Using Single Sample in Adolescence;evaluation of Deep Learning Networks for Keratoconus Detection Using Corneal Topographic images;deep Facial Emotion Recognition System Under Facial Mask Occlusion;domain Adaptation Based Technique for image Emotion Recognition Using image Captions;gesture Recognition in Sign Language Videos by Tracking the Position and Medial Representation of the Hand Shapes;deepDoT: Deep Framework for Detection of Tables in Document images;Correcting Low Illumination images Using PSO-Based Gamma Correction and image Classifying Method;DeblurRL: image Deblurring with Deep Reinforcement Learning;FGrade: A Large Volume Dataset for Grading Tomato Freshness Quality;enhancement of Region of Interest from a Single Backlit image with Multiple Features;human Action Recognition from 3D Landmark Points of the Performer;real-time Sign Language Interpreter on Embedded Platform;complex Gradient Function Based Descriptor for Iris Biometrics and Action Recognition;on-Device Language Identification of Text in images Using Diacritic Characters;a Pre-processing Assisted Neural Network for Dynamic Bad Pixel Detection in Bayer images;preface;dynamic User Interface Composition.
Panoramic or stitched imageprocessing has wide applications in areas such as medical imaging, topographical mapping, and deep space exploration. Rapid development of high-speed communication and artificial intelligen...
详细信息
real-time near-infrared (NIR) face alignment holds significant importance across various domains, such as security, healthcare, and augmented reality. However, existing face alignment techniques tailored for visible-l...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
real-time near-infrared (NIR) face alignment holds significant importance across various domains, such as security, healthcare, and augmented reality. However, existing face alignment techniques tailored for visible-light (VIS) encounter a decline in accuracy when applied in NIR settings. This decline stems from the domain discrepancy between VIS and NIR facial domains and the absence of meticulously annotated NIR facial data. To address this issue, we introduce a system and strategy for gathering paired VIS-NIR facial images and meticulously annotating precise landmarks. Our system facilitates streamlined dataset preparation by utilizing automatic annotation transfer from VIS images to their corresponding NIR counterparts. Following our devised approach, we constructed an inaugural dataset comprising high-frame-rate paired VIS-NIR facial images with landmark annotations. Additionally, to enhance the diversity of facial data, we augment our dataset through VIS-NIR image-to-image (img2img) translation using publicly available facial landmark datasets. Through the retraining of face alignment models and subsequent evaluations, our findings demonstrate a noteworthy enhancement in the accuracy of face alignment under NIR conditions using our dataset. Furthermore, the augmented dataset exhibits refined accuracy, particularly notable in the case of different individuals' facial features.
暂无评论