video streaming is a subfield of signal processing that encompasses the pre-processing of video sequences, their contextual segmentation, application-specific feature extraction and selection, and the detection of dis...
详细信息
In this paper, we introduce DLAPID, a novel decoupled parallel hardware-software co-design architecture for real-timevideo dehazing. From a software point of view, DLAPID isolates the atmospheric light operation from...
详细信息
In recent years, deep learning (DL)-based automatic view classification of 2D transthoracic echocardiography (TTE) has demonstrated strong performance, but has not fully addressed key clinical requirements such as vie...
详细信息
In recent years, deep learning (DL)-based automatic view classification of 2D transthoracic echocardiography (TTE) has demonstrated strong performance, but has not fully addressed key clinical requirements such as view coverage, classification accuracy, inference delay, and the need for thorough exploration of performance in real-world clinical settings. We proposed a clinical requirement-driven DL framework, TTESlowFast, for accurate and efficient video-level TTE view classification. This framework is based on the SlowFast architecture and incorporates both a sampling balance strategy and a data augmentation strategy to address class imbalance and the limited availability of labeled TTE videos, respectively. TTESlowFast achieved an overall accuracy of 0.9881, precision of 0.9870, recall of 0.9867, and F1 score of 0.9867 on the test set. After field deployment, the model's overall accuracy, precision, recall, and F1 score for view classification were 0.9607, 0.9586, 0.9499, and 0.9530, respectively. The inference time for processing a single TTE video was 105.0 +/- 50.1 ms on a desktop GPU (NVIDIA RTX 3060) and 186.0 +/- 5.2 ms on an edge computing device (Jetson Orin Nano), which basically meets the clinical demand for immediate processing following image acquisition. The TTESlowFast framework proposed in this study demonstrates effective performance in TTE view classification with low inference delay, making it well-suited for various medical scenarios and showing significant potential for practical application.
Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads' asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-timevideo streaming encoder-decoder sample application, with the encoder running on an embedded device.
As one of the most important data sources in machine vision system, camera is facing increasingly strict requirements in videoimage acquisition, processing and transmission. Through the in-depth analysis of the devel...
详细信息
ISBN:
(纸本)9798400713880
As one of the most important data sources in machine vision system, camera is facing increasingly strict requirements in videoimage acquisition, processing and transmission. Through the in-depth analysis of the development trend of the camera, it can be clear that its main pursuit of three goals: improve videoimageprocessing performance, reduce costs and enhance flexibility. However, most of the existing videoprocessing platforms have problems such as low data processing efficiency and lack of real-time performance, which is difficult to meet the high performance requirements of modern cameras for videoimageprocessing. Because of its high real-time performance, powerful computing power, high integration and flexibility, embedded system has become a promising direction to realize advanced videoimageprocessing algorithms. With the embedded platform, the camera system can realize more efficient real-time data processing and higher quality videoimage output. In the current market, three high-performance embedded processors stand out, namely DSP, ASIC and FPGA. Among them, FPGA chip has become the ideal hardware platform for camera videoimageprocessing with its parallel computing capability, rich interface resources and field programmable characteristics. Based on Anlogic EG4S20 FPGA platform, this paper realizes imageprocessing functions, including Bayer format to RGB conversion, gray world algorithm and perfect reflection algorithm. This research makes full use of the powerful computing power of FPGAs to demonstrate imageprocessing and enhancement techniques to address key performance challenges in camera systems.
imageprocessing is essential for applications such as robot vision, remote sensing, computational photography, augmented reality etc. In the design of dedicated hardware for such applications, IEEE Std 754T floating ...
详细信息
ISBN:
(纸本)9781665474047
imageprocessing is essential for applications such as robot vision, remote sensing, computational photography, augmented reality etc. In the design of dedicated hardware for such applications, IEEE Std 754T floating point (float) arithmetic units have been widely used. While float-based architectures have achieved favorable results, their hardware is complicated and requires a large silicon footprint. In this paper we propose a Posit-based image and video processor (PositIV), a completely pipelined, configurable, image processor using posit arithmetic that guarantees lower power use and smaller silicon footprint than floats. PositIV is able to effectively overlap computation with memory access and supports multidimensional addressing, virtual border handling, prefetching and buffering. It is successfully able to integrate configurability, flexibility, and ease of development with real-time performance characteristics. The performance of PositIV is validated on several imageprocessing algorithms for different configurations and compared against state-of-the-art implementations. Additionally, we empirically demonstrate the superiority of posits in processingimages for several conventional algorithms, achieving at least 35-40% improvement in image quality over standard floats.
IoT devices are enabled to capture and upload videos with increasing bitrates. Massive IIoT is eager for effective videoprocessing techniques to satisfy the requirements of real-timevideo services. With the emergenc...
详细信息
ISBN:
(纸本)9798350358261;9798350358278
IoT devices are enabled to capture and upload videos with increasing bitrates. Massive IIoT is eager for effective videoprocessing techniques to satisfy the requirements of real-timevideo services. With the emergence of 5G-unlicensed (5G-U), ultra-low latency video applications become possible. However, existing encoding standards for video services in Web 2.0, such as H.265, are not naturally designed for IIoT video streaming, leading to bandwidth pressure where 5G-U coexists with various other wireless signals. To tackle this problem and to support low-latency video utilization by IIoT video sources, we propose an Adaptive Compression-Reconstruction framework named ACORN, which is based on compressed sensing and recent advances in deep learning. At end nodes, we compress multiple sequential video frames into a single frame to reduce video volume. We design a QoE-aware parameter selection mechanism to deal with volatile network environments during compression. With learnable gated convolution layers and channel-wise soft-thresholding operators, ACORN also builds a real-time reconstruction module. Experimental results reveal that video analytics can be conducted on compressed frames. The reconstruction algorithm in ACORN is with 1-4dB improvements. Moreover, both the encoding time cost and the encoded video volume are reduced by more than 4x under the ACORN framework.
Different from the recent popular super resolution system based on AI technology which needs normally massive training datasets, the micro-scanning super resolution system by integrating the high-precision mechanism a...
详细信息
In this paper, we present a sequential training methodology aimed at improving the recognition of elevator buttons using the YOLOv5 object detection model. The methodology is structured into three distinct phases. In ...
详细信息
ISBN:
(纸本)9788993215380
In this paper, we present a sequential training methodology aimed at improving the recognition of elevator buttons using the YOLOv5 object detection model. The methodology is structured into three distinct phases. In the first phase, we generate a synthetic dataset where elevator buttons, cropped from their original context, are placed on random image backgrounds. This phase is designed to help the model learn to identify buttons independently of their surroundings, ensuring a foundational understanding of button features without contextual distractions. In the second phase, we augment the cropped button dataset by applying various transformations such as random flips, rotations, and scaling. These augmentations increase the diversity and robustness of the training data, allowing the model to generalize better to variations in button appearances. The final phase involves training the model on images of full elevator panels. This step is crucial for helping the model understand the contextual placement and spatial relationships of the buttons within the panel, which is essential for accurate detection in real-world scenarios. Additionally, we enhance the real-timevideo input exposure to improve visibility under varying lighting conditions, addressing common challenges faced in practical applications. For post-processing, we integrate a Channel and Spatial Reliability Tracker (CSRT) to maintain button-tracking consistency in video sequences. This tracker helps ensure that once a button is detected, its position is reliably followed across frames, improving the overall accuracy and reliability of the system. This comprehensive approach, which combines the use of synthetic data, extensive data augmentation techniques, and contextual training on full panel images, aims to better simulate real-world scenarios. As a result, the proposed methodology significantly enhances the robustness and reliability of the YOLOv5 model in recognizing elevator buttons under diverse condi
This work is devoted to the development of a novel deep learning encoder-decoder algorithm for real-time noise and blur elimination in video frames, received from UAV. This work improves on existing algorithms by prov...
详细信息
ISBN:
(纸本)9798350372557
This work is devoted to the development of a novel deep learning encoder-decoder algorithm for real-time noise and blur elimination in video frames, received from UAV. This work improves on existing algorithms by providing a more flexible blind deblurring solution than existing kernel-based methods. The proposed method can be applied to both improve the drone operator's capabilities and to improve the performance of autonomous imageprocessing tasks, such as object identification and visual navigation systems. Different types of blur as well as possible types of noise are presented. A brief overview of existing methods is provided. The problem of frame alignment due to the object's movement and associated noise is considered. Existing deblurring and image restoration methods are reviewed, including state-of-the-art. Their limitations are highlighted. To solve the limitations a method based on a fully convolutional encoder-decoder network with residual connections is presented. Dataset generation and training procedures are discussed. The approach is then compared to existing state-of-the-art deep learning methods. The proposed method enables up to 9 times faster blind image restoration with comparable quality in comparison to existing state-of-the-art image restoration methods.
暂无评论