The rapid adoption of Advanced Driver Assistance Systems (ADAS) in modern vehicles, aiming to elevate driving safety and experience, necessitates the real-timeprocessing of high-definition video data. This requiremen...
详细信息
The rapid adoption of Advanced Driver Assistance Systems (ADAS) in modern vehicles, aiming to elevate driving safety and experience, necessitates the real-timeprocessing of high-definition video data. This requirement brings about considerable computational complexity and memory demands, highlighting a critical research void for a design integrating high FPS throughput with optimal Mean Average Precision (mAP) and Mean Intersection over Union (mIoU). Performance improvement at lower costs, multi-tasking ability on a single hardware platform, and flawless incorporation into memory-constrained devices are also essential for boosting ADAS performance. Addressing these challenges, this study proposes an ADAS multi-task learning hardware-software co-design approach underpinned by the Kria KV260 Multi-Processor System-on-Chip Field Programmable Gate Array (MPSoC-FPGA) platform. The approach facilitates efficient real-time execution of deep learning algorithms specific to ADAS applications. Utilizing the BDD100K+Waymo, KITTI, and CityScapes datasets, our ADAS multi-task learning system endeavours to provide accurate and efficient multi-object detection, segmentation, and lane and drivable area detection in road images. The system deploys a segmentation-based object detection strategy, using a ResNet-18 backbone encoder and a Single Shot Detector architecture, coupled with quantization-aware training to augment inference performance without compromising accuracy. The ADAS multi-task learning offers customization options for various ADAS applications and can be further optimized for increased precision and reduced memory usage. Experimental results showcase the system's capability to perform real-time multi-class object detection, segmentation, line detection, and drivable area detection on road images at approximately 25.4 FPS using a 1920 x 1080p Full HD camera. Impressively, the quantized model has demonstrated a 51% mAP for object detection, 56.62% mIoU for image segmen
This paper describes a low cost computer vision system able to obtain traffic metrics at urban intersections. The proposed system is based on a Bayesian network based reasoning model. It employs the data extracted fro...
详细信息
This paper describes a low cost computer vision system able to obtain traffic metrics at urban intersections. The proposed system is based on a Bayesian network based reasoning model. It employs the data extracted from background subtraction and contrast analysis techniques applied to predefined regions of interest of the video sequences, to evaluate different traffic metrics. The system has been designed to be able to work with already installed urban cameras, in order to reduce installation costs. So, it can be configured to work with different types of image sizes and video frame rates, as well as to process images taken from different distances and perspectives. The validity of the proposed system has been proved using a Raspberry Pi platform and tested using two real surveillance video cameras managed by the local authority of Cartagena (Spain) during different environmental light conditions. Using this hardware the system is able to process VGA grayscale images at a rate of 8 frames per second.
Misfocus is ubiquitous for almost all video producers, degrading video quality and often causing expensive delays and reshoots. Current autofocus (AF) systems are vulnerable to sudden disturbances such as subject move...
详细信息
ISBN:
(纸本)9798400705250
Misfocus is ubiquitous for almost all video producers, degrading video quality and often causing expensive delays and reshoots. Current autofocus (AF) systems are vulnerable to sudden disturbances such as subject movement or lighting changes commonly present in real-world and on-set conditions. Single image defocus deblurring methods are temporally unstable when applied to videos and cannot recover details obscured by temporally varying defocus blur. In this paper, we present an end-to-end solution that allows users to correct misfocus during post-processing. Our method generates and parameterizes defocused videos into sharp layered neural atlases and propagates consistent focus tracking back to the video frames. We introduce a novel differentiable disk blur layer for more accurate point spread function (PSF) simulation, coupled with a circle of confusion (COC) map estimation module with knowledge transferred from the current single image defocus deblurring (SIDD) networks. Our pipeline offers consistent, sharp video reconstruction and effective subject-focus correction and tracking directly on the generated atlases. Furthermore, by adopting our approach, we achieve comparable results to the state-of-the-art optical flow estimation approach from defocus videos.
The rising demand for high quality display has ensued active research in high dynamic range (HDR) imaging, which has the potential to replace the standard dynamic range imaging. This is due to HDR's features like ...
详细信息
The rising demand for high quality display has ensued active research in high dynamic range (HDR) imaging, which has the potential to replace the standard dynamic range imaging. This is due to HDR's features like accurate reproducibility of a scene with its entire spectrum of visible lighting and color depth. But this capability comes with expensive capture, display, storage and distribution resource requirements. Also, display of HDR images/video content on an ordinary display device with limited dynamic range requires some form of adaptation. Many adaptation algorithms, widely known as tone mapping (TM) operators, have been studied and proposed in the last few decades. In this article, we present a comprehensive survey of 60 TM algorithms that have been implemented on hardware for acceleration and real-time performance. In this state-of-the-art survey, we will discuss those TM algorithms which have been implemented on GPU, FPGA, and ASIC in terms of their hardware specifications and performance. Output image quality is an important metric for TM algorithms. From our literature survey we found that, various objective quality metrics have been used to demonstrate the quality of those algorithms hardware implementation. We have compiled those metrics used in this survey, and analyzed the relationship between hardware cost, image quality and computational efficiency. Currently, machine learning-based (ML) algorithms have become an important tool to solve many imageprocessing tasks, and this article concludes with a discussion on the future research directions to realize ML-based TM operators on hardware.
With the development of artificial intelligence technology, urban traffic management has become increasingly convenient, and the task of illegal parking detection has become a major research focus. Currently, most ill...
详细信息
Security is a significant concern at all locations where CCTV cameras are installed. Security is a top priority;you must invest considerable time and effort to keep track of everything. Shortly, developments in comput...
详细信息
This study aims to enhance the detection accuracy and efficiency of cotton bolls in complex natural environments. Addressing the limitations of traditional methods, we developed an automated detection system based on ...
详细信息
This study aims to enhance the detection accuracy and efficiency of cotton bolls in complex natural environments. Addressing the limitations of traditional methods, we developed an automated detection system based on computer vision, designed to optimize performance under variable lighting and weather conditions. We introduced COTTON-YOLO, an improved model based on YOLOv8n, incorporating specific algorithmic optimizations and data augmentation techniques. Key innovations include the C2F-CBAM module to boost feature recognition capabilities, the Gold-YOLO neck structure for enhanced information flow and feature integration, and the WIoU loss function to improve bounding box precision. These advancements significantly enhance the model's environmental adaptability and detection precision. Comparative experiments with the baseline YOLOv8 model demonstrated substantial performance improvements with COTTON-YOLO, particularly a 10.3% increase in the AP50 metric, validating its superiority in accuracy. Additionally, COTTON-YOLO showed efficient real-timeprocessing capabilities and a low false detection rate in field tests. The model's performance in static and dynamic counting scenarios was assessed, showing high accuracy in static cotton boll counting and effective tracking of cotton bolls in video sequences using the ByteTrack algorithm, maintaining low false detections and ID switch rates even in complex backgrounds.
News broadcasters must produce engaging video clips quicker than ever to ensure their successful positioning in the market. This is due, in part, to the growing number of news sources and changes in media consumption ...
详细信息
ISBN:
(纸本)9798400716164
News broadcasters must produce engaging video clips quicker than ever to ensure their successful positioning in the market. This is due, in part, to the growing number of news sources and changes in media consumption amongst target audiences. This evolution has amplified the need to quickly produce news clips, a requirement that remains at odds with the traditionally manual and time-consuming video editing processes. Besides advances in automating video news production, current systems are yet to meet the sufficient automation level and quality standards required for professional news broadcasting. Addressing this gap, we propose a novel transformer-based framework for automatically composing news clips to streamline the editing process. Our framework is predicated on a vision-language feature embedding mechanism and a cross-attention transformer architecture designed to generate multi-shot news clips semantically coherent with the editorial text and stylistically consistent with professional editing benchmarks. Our framework composes news clips with a length of 2 minutes from source material ranging from 20 minutes to 2 hours in less than 5 minutes using a single GPU. In our user study, target groups with different experience levels rated the generated videos on a 6-point Likert scale. Users rated the news clips generated by our framework with an average score of 4.13 and the manually edited news clips with an average score of 4.58.
Low-Light video Enhancement (LLVE) has received considerable attention in recent years. One of the critical requirements of LLVE is inter-frame brightness consistency, which is essential for maintaining the temporal c...
详细信息
ISBN:
(纸本)9798400701085
Low-Light video Enhancement (LLVE) has received considerable attention in recent years. One of the critical requirements of LLVE is inter-frame brightness consistency, which is essential for maintaining the temporal coherence of the enhanced video. However, most existing single-image-based methods fail to address this issue, resulting in flickering effect that degrades the overall quality after enhancement. Moreover, 3D Convolution Neural Network (CNN)-based methods, which are designed for video to maintain inter-frame consistency, are computationally expensive, making them impractical for real-time applications. To address these issues, we propose an efficient pipeline named FastLLVE that leverages the Look-Up-Table (LUT) technique to maintain inter-frame brightness consistency effectively. Specifically, we design a learnable Intensity-Aware LUT (IA-LUT) module for adaptive enhancement, which addresses the low-dynamic problem in low-light scenarios. This enables FastLLVE to perform low-latency and low-complexity enhancement operations while maintaining high-quality results. Experimental results on benchmark datasets demonstrate that our method achieves the State-Of-The-Art (SOTA) performance in terms of both image quality and inter-frame brightness consistency. More importantly, our FastLLVE can process 1,080p videos at 50+ Frames Per Second (FPS), which is 2x faster than SOTA CNN-based methods in inference time, making it a promising solution for real-time applications. The code is available at https://***/Wenhao-Li777/FastLLVE.
Using MRI to reliably diagnose brain tumors is important but it is often time-consuming. The study uses an automated method for brain tumor detection and classification using imageprocessing techniques and convention...
详细信息
暂无评论