We endeavor on a rarely explored task named thermal infrared video denoising. Perception in the thermal infrared significantly enhances the capabilities of machine vision. Nonetheless, noise in imaging systems is one ...
详细信息
We endeavor on a rarely explored task named thermal infrared video denoising. Perception in the thermal infrared significantly enhances the capabilities of machine vision. Nonetheless, noise in imaging systems is one of the factors that hampers the large-scale application of equipment. Existing thermal infrared denoising methods, primarily focusing on the image level, inadequately utilize time-domain information and insufficiently conduct investigation of system-level mixed noise, presenting the inferior ability in the video-recorded era;while video denoising methods, commonly applied to RGB cameras, exhibit uncertain effectiveness owing to substantial dissimilarities in the noise models and modalities between RGB and thermal infrared images. In sight of this, we initially revisit the imaging mechanism, while concurrently introducing a physics-inspired noise generator based on the sources and characteristics of system noise. Subsequently, a thermal infrared video denoising dataset consisting of 518 real-world videos is constructed. Lastly, we propose a denoising model called multi-domain infrared video denoising network, capable of concentrating features from the time, space, and frequency domains to restore high-fidelity videos. Extensive experiments demonstrate that the proposed method achieves state-of-the-art denoising quality and can be successfully applied to commercial cameras and downstream vision tasks, providing a new avenue for clear videography in the thermal infrared world. The dataset and code will be available.
Face recognition is used in numerous authentication applications, unfortunately they are susceptible to spoofing attacks such as paper and screen attacks. In this paper, we propose a method that is able to recognise i...
详细信息
ISBN:
(纸本)9783031510229;9783031510236
Face recognition is used in numerous authentication applications, unfortunately they are susceptible to spoofing attacks such as paper and screen attacks. In this paper, we propose a method that is able to recognise if a face detected in a video is not real and the type of attack performed on the fake video. We propose to learn the temporal features exploiting a 3D Convolution Network that is more suitable for temporal information. The 3D ConvNet, other than summarizing temporal information, allows us to build a real-time method since it is so much more efficient to analyse clips instead of analyzing single frames. The learned features are classified using a binary classifier to distinguish if the person in the clip video is real (i.e. live) or not, multi class classifier recognises if the person is real or the type of attack (screen, paper, ect.). We performed our test on 5 public datasets: Replay Attack, Replay Mobile, MSU-MSFD, Rose-Youtu, RECOD-MPAD.
Systems and methods of multimedia videoprocessing that employ eye-contact realization techniques for improved realtime multimedia experiences. Such systems and methods of multimedia videoprocessing are operative to...
详细信息
ISBN:
(纸本)9798350350227;9798350350210
Systems and methods of multimedia videoprocessing that employ eye-contact realization techniques for improved realtime multimedia experiences. Such systems and methods of multimedia videoprocessing are operative to employ multi-view videoprocessing techniques by extracting eye contacts from input multi-view videos and composing input multi-view videos into one single video with free viewpoint eye contents. Having performed such videoprocessing techniques of the input multi-view videos, a high-quality video with free viewpoint eye contacts will be delivered to an end user. By utilizing these novel systems and methods, realtime multimedia experiences are significantly improved.
Ground penetrating radar (GPR) is an effective tool for detecting internal defects in asphalt roads due to its non-destructive nature and high resolution. However, detecting defects in GPR images remains challenging, ...
详细信息
Ground penetrating radar (GPR) is an effective tool for detecting internal defects in asphalt roads due to its non-destructive nature and high resolution. However, detecting defects in GPR images remains challenging, as existing models lack sufficient accuracy and are often complex and redundant. To address these issues, a lightweight real-time detection method based on ground-penetrating radar is proposed in this study. First, a field GPR image dataset of asphalt roads was collected and constructed. To address the limited defect sample data acquired by GPR, an efficient copy-and-paste augmentation method was employed. This method involved copying and pasting defect samples in GPR images while incorporating random scale jitter and position migration operations to generate a sufficient number of real defect samples. Second, the C2f-DSConv module and the SE attention mechanism were designed and introduced based on the YOLOv8 network to improve detection accuracy in the complex background environment of GPR images. Finally, a channel pruning strategy was used to prune the improved YOLOv8 network, reducing model complexity while maintaining detection accuracy. The final model achieves an average detection accuracy of 90.9% and a detection speed of 140.9 FPS. The results show that the proposed method combines both detection accuracy and real-time performance, further advancing the engineering application of internal defect detection in asphalt roads.
Automatic crowd counting using density estimation has gained significant attention in computer vision research. As a result, a large number of crowd counting and density estimation models using convolution neural netw...
详细信息
Automatic crowd counting using density estimation has gained significant attention in computer vision research. As a result, a large number of crowd counting and density estimation models using convolution neural networks (CNN) have been published in the last few years. These models have achieved good accuracy over benchmark datasets. However, attempts to improve the accuracy often lead to higher complexity in these models. In real-timevideo surveillance applications using drones with limited computing resources, deep models incur intolerable higher inference delay. In this paper, we propose (i) a Lightweight Crowd Density estimation model (LCDnet) for real-timevideo surveillance, and (ii) an improved training method using curriculum learning (CL). LCDnet is trained using CL and evaluated over two benchmark datasets i.e., DroneRGBT and CARPK. Results are compared with existing crowd models. Our evaluation shows that the LCDnet achieves a reasonably good accuracy while significantly reducing the inference time and memory requirement and thus can be deployed over edge devices with very limited computing resources.
This paper presents a new Edge-AI algorithm for real-time and multi-feature (social distancing, mask detection, and facial temperature) measurement to minimize the spread of COVID-19 among individuals. COVID-19 has ex...
详细信息
This paper presents a new Edge-AI algorithm for real-time and multi-feature (social distancing, mask detection, and facial temperature) measurement to minimize the spread of COVID-19 among individuals. COVID-19 has extenuated the need for an intelligent surveillance video system that can monitor the status of social distancing, mask detection, and measure the temperature of faces simultaneously using deep learning (DL) models. In this research, we utilized the fusion of three different YOLOv4-tiny object detectors for each task of the integrated system. This DL model is used for object detection and targeted for real-time applications. The proposed models have been trained for different data sets, which include people detection, mask detection, and facial detection for measuring the temperature, and evaluated on these existing data sets. Thermal and visible cameras have been used for the proposed approach. The thermal camera is used for social distancing and facial temperature measurement, while a visible camera is used for mask detection. The proposed method has been executed on NVIDIA platforms to assess algorithmic performance. For evaluation of the trained models, accuracy, recall, and precision have been measured. We obtained promising results for real-time detection for human recognition. Different couples of thermal and visible cameras and different NVIDIA edge platforms have been adopted to explore solutions with different trade-offs between cost and performance. The multi-feature algorithm is designed to monitor the individuals continuously in the targeted environments, thus reducing the impact of COVID-19 spread.
video object detection is a challenging task in computer vision since it needs to handle the object appearance degradation problem that seldom occurs in the image domain. Off-the-shelf video object detection methods t...
详细信息
video object detection is a challenging task in computer vision since it needs to handle the object appearance degradation problem that seldom occurs in the image domain. Off-the-shelf video object detection methods typically aggregate multi-frame features at one stroke to alleviate appearance degradation. However, these existing methods do not take supervision knowledge into consideration and thus still suffer from insufficient feature aggregation, resulting in the false detection problem. In this paper, we take a different perspective on feature aggregation, and propose a dynamic graph contrastive network (DGC-Net) for video object detection, including three improvements against existing methods. First, we design a frame-level graph contrastive module to aggregate frame features, enabling our DGC-Net to fully exploit discriminative contextual feature representations to facilitate video object detection. Second, we develop a proposal-level graph contrastive module to aggregate proposal features, making our DGC-Net sufficiently learn discriminative semantic feature representations. Third, we present a graph transformer to dynamically adjust the graph structure by pruning the useless nodes and edges, which contributes to improving accuracy and efficiency as it can eliminate the geometric-semantic ambiguity and reduce the graph scale. Furthermore, inherited from the framework of DGC-Net, we develop DGC-Net Lite to perform real-timevideo object detection with a much faster inference speed. Extensive experiments conducted on the imageNet VID dataset demonstrate that our DGC-Net outperforms the performance of current state-of-the-art methods. Notably, our DGC-Net obtains 86.3%/87.3% mAP when using ResNet-101/ResNeXt-101.
With the development of science and technology and the renewal of media means in the era of cultural industry, electronic media develops rapidly, videoimage technology with digital realization as the carrier develops...
详细信息
This paper proposes an automated inspection approach for printed circuit boards (PCBs) that can accurately locate defects to solve the issues of low precision, complex equipment, and high cost. Digital image processin...
详细信息
This paper proposes an automated inspection approach for printed circuit boards (PCBs) that can accurately locate defects to solve the issues of low precision, complex equipment, and high cost. Digital imageprocessing techniques are utilized in this method, including filtering, image segmentation, feature extraction, alignment, and mathematical morphology processing. To overcome the Otsu thresholding segmentation algorithm's high computational cost and poor real-time performance, a particle swarm approach is optimized to increase image segmentation efficiency. Meanwhile, combining the benefits of the FLANN algorithm and the SURF method, matching image feature points is done based on the SURF algorithm. The performance of matching image feature points is improved. In addition, the alignment error of the images is reduced. According to experimental results, the improved PCB defect detection algorithm demonstrated 98.9% accuracy, with remarkable efficiency and accuracy, and can satisfy PCB defect detection requirements.
Coastal monitoring is vital in environmental management, disaster mitigation, and addressing climate change impacts. Traditional methods are time-consuming and error- prone, prompting the need for innovative systems. ...
详细信息
Coastal monitoring is vital in environmental management, disaster mitigation, and addressing climate change impacts. Traditional methods are time-consuming and error- prone, prompting the need for innovative systems. This study introduces the Coastal video Monitoring System (CoViMos), a novel framework for real-time shoreline detection in tropical regions, specifically at Kedonganan Beach, Bali. The CoViMos framework utilizes advanced video monitoring and optimized morphological operations to address challenges such as environmental noise and dynamic shoreline behavior. Key innovations include Kapur's entropy thresholding enhanced with the Grasshopper Optimization Algorithm (GOA) and structuring elements tailored to the beach's unique features. Sensitivity analysis reveals that a structuring element size of five pixels offers optimal performance, balancing efficiency, and image fidelity. This configuration achieves peak values in quality metrics such as the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Complex Wavelet SSIM (CWSSIM), and Feature Similarity Index (FSIM) while minimizing Mean Squared Error (MSE) and reducing processingtime. The results demonstrate significant improvements in shoreline detection accuracy, with PSNR increasing by 9.3%, SSIM by 1.4%, CWSSIM by 1.7%, and FSIM by 1.6%. processingtime decreased by 1.3%, emphasizing the system's computational efficiency. These enhancements ensure more precise shoreline mapping, even in noisy and dynamic environments.
暂无评论