image deraining is an imageprocessing technique to restore a rainy image back to rain-free one. Iterative Soft Thresholding Algorithm (ISTA) is an iterative procedure to estimate sparsity. ISTA demands considerable c...
详细信息
ISBN:
(纸本)9798350386851;9798350386844
image deraining is an imageprocessing technique to restore a rainy image back to rain-free one. Iterative Soft Thresholding Algorithm (ISTA) is an iterative procedure to estimate sparsity. ISTA demands considerable computation burdens of Floating Point Operations (FLOPs), limiting its applications for real-time requirement. The proposed Restored-Feature Iterative Soft Thresholding Algorithm (RISTA) provides a low complexity approach to reduce FLOPs. Experiment results demonstrate this work has a computation saving ratio up to 66%, while maintaining a Peak Signal-to-Noise Ratio (PSNR) error as low as 6%.
The emerging of 3D video-capable embedded mobile devices is expected due to the popularization of multimedia services and the demand for novel immersive video technologies. Such devices require efficient hardware-frie...
详细信息
The emerging of 3D video-capable embedded mobile devices is expected due to the popularization of multimedia services and the demand for novel immersive video technologies. Such devices require efficient hardware-friendly heuristics to deal with strict processing requirements and limited energy supply. To contribute to these requirements, this work presents a complete 3D-HEVC intra-frame prediction hardware design that supports a flexible coding order between texture and depth channels. The developed hardware employs hardware-friendly constraints and novel heuristics to explore inter-channel redundancies and to reduce the computational effort through the novel inter-channel directional structure detector heuristic. The designed 3D-HEVC intra-frame prediction system dissipates 384.6 mW while processing three HD 1080p views (texture + depth) at 30 frames per second in real-time. To the best of our knowledge, this is the first work to propose a complete 3D-HEVC intra-frame prediction system with support to flexible coding order. In addition, this is the only hardware design to process luminance and chrominance texture channels and depth channel.
Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of...
详细信息
ISBN:
(纸本)9798350353006
Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancement benchmark (UVEB) to promote the development of underwater vision. It contains 1,308 pairs of video sequences and more than 453,000 high-resolution with 38% Ultra-High-Definition (UHD) 4K frame pairs. UVEB comes from multiple countries, containing various scenes and video degradation types to adapt to diverse and complex underwater environments. We also propose the first supervised underwater video enhancement method, UVE-Net. UVE-Net converts the current frame information into convolutional kernels and passes them to adjacent frames for efficient inter-frame information exchange. By fully utilizing the redundant degraded information of underwater videos, UVE-Net completes video enhancement better. Experiments show the effective network design and good performance of UVE-Net.
Locked-in Syndrome (LIS) poses significant challenges for individuals experiencing complete paralysis, who must rely solely on eye and head movements and hearing to communicate. The main issue arises from the urgent n...
详细信息
Locked-in Syndrome (LIS) poses significant challenges for individuals experiencing complete paralysis, who must rely solely on eye and head movements and hearing to communicate. The main issue arises from the urgent need for a customized communication solution due to the severe limitations LIS patients encounter in interacting with their surroundings. This difficulty extends beyond physical constraints, greatly impacting their overall quality of life and psychological well-being. To address this complex challenge, we have developed an innovative approach that integrates advanced eye-tracking technology, Natural Language processing (NLP) and Artificial Intelligence (AI). This holistic solution not only restores communication capabilities but also provides crucial support for the mental health and psychological well-being of LIS patients, offering a ray of hope for a better future. Beyond addressing communication challenges, our proposal also focuses on improving the mental health of LIS patients through interactive communication either with surrounding people or AI bots. Our solution, named "ParaEyes" utilizes webcam-detected eye movements to navigate the communication interface, employing real-timevideoimageprocessing using Python. Additionally, users can engage with various AI bots, including ChatGPT, YouTube Bot and ParaEyes Visual Bot, based on their preferences. ParaEyes achieves a notably 2X faster average setup time and on average 10% higher accuracy compared to the open-source alternatives. This approach enhances communication skills and mental health support while also being inspired by ChatGPT's robust measures to safeguard user data and ensure user privacy through tailored interactions and responsive functionalities.
The real-time Reconstruction of 3-D space and the usage of holographic video entail the capture and processing of films that have a three-dimensional shape in them. The manner of capturing the 3-dimensional shape incl...
详细信息
The present study showcases a novel deep learning-based vision application tasked with reducing the communication gap between sign language and non-sign language users. Speech and hearing impairments are a type of dis...
详细信息
ISBN:
(纸本)9783031686382;9783031686399
The present study showcases a novel deep learning-based vision application tasked with reducing the communication gap between sign language and non-sign language users. Speech and hearing impairments are a type of disability that restricts an individual's ability to communicate with others properly. Modern-day automation tools can be used to address this communication gap and allow people to communicate ubiquitously and in a variety of situations. The method defined in the paper involves loading a video file, extracting each frame, and detecting the hand landmarks in each frame using the Media-Pipe library. Then the frame is cropped, and the region of interest is pre-processed and stored in a new data directory for training purposes. The pre-processing involves the use of Gaussian blur, edge detection, morphological transformations, and signal processing functions. Data augmentation is then performed, and images are saved in a new directory. The images are then used to train a custom CNN model, which contains four convolutional layers along with two fully connected layers. The model is compiled using the categorical cross-entropy loss function, optimised using the RMSprop optimiser, and then evaluated using the evaluation metric, accuracy. The predicted sign language alphabet is displayed on the screen and is converted to speech using the Google Text-to-Speech library. The model achieves an overall accuracy of 93.96%. The findings indicate that the proposed approach can serve as a road map to develop a real-time system capable of sign language recognition and Direct future investigations in this domain.
An algorithm for video-based outdoor light gray smoke early detection has been developed by a complex set of features. This algorithm provides real-timeprocessing for high-resolution video. For this purpose, prelimin...
详细信息
. image detail enhancement is critical to the performance of short-wave infrared (SWIR) imaging systems. Recently, the requirement for real-timeprocessing of high-definition (HD) SWIR video has shown rapid growth. Ne...
详细信息
. image detail enhancement is critical to the performance of short-wave infrared (SWIR) imaging systems. Recently, the requirement for real-timeprocessing of high-definition (HD) SWIR video has shown rapid growth. Nevertheless, the research on field programmable gate array (FPGA) implementation of HD SWIR streaming videoprocessing architecture is relatively few. This work proposes a real-time FPGA architecture of SWIR video enhancement by combining the difference of Gaussian filter and plateau equalization. To accelerate the algorithm and reduce memory bandwidth, two efficient key architectures, namely edge information extraction and equalization and remapping architecture, are proposed to sharpen edges and improve dynamic range. The experimental results demonstrated that the proposed architecture achieved a real-timeprocessing of 1280 x 1024@60Hz with 2.7K lookup tables, 2.5K Slice Reg, and about 350 kb of block RAM consumption, and their utilization reached 12.5%, 19.2%, and 12.5% for the XC7A200T FPGA board, respectively. Moreover, the proposed architecture is fully pipelined and synchronized to the pixel clock of output video, meaning that it can be seamlessly integrated into diverse real-timevideoprocessing systems.
In this article we present a new approach to scaling edge server based real-timevideo analytics by utilizing a novel flow control mechanism that we call 'Pace Steering' (PS). In contrast to server-side schedu...
详细信息
ISBN:
(纸本)9798350366495;9798350366488
In this article we present a new approach to scaling edge server based real-timevideo analytics by utilizing a novel flow control mechanism that we call 'Pace Steering' (PS). In contrast to server-side scheduling, flow control enables the server to control the frame rate of the connected streams, thus extending its control to also the network traffic. By exploiting the fact that video analytics applications have a constant frame rate for each client, and a predictable inference time for the frame processing, Pace Steering is able to avoid server-side queueing and balance the network load in a shared wireless access link, leading to a better utilization of both the compute and the network resources. We provide a mathematical analysis to show how to synchronize video streams with delays based on server state information. We then show how queueing time of a request provides an ideal synchronization delay, and then extend this idea to consider batching for higher throughput. We evaluate our approach with benchmarks in a physical testbed using a commodity Wi-Fi. The results show that PS enables up to 80 concurrent 10 frames per second (FPS) streams to be served without latency requirement violations for 95% of the sent frames, which is twice as many streams as without PS.
This work presents a smart wheelchair system whereby users may operate it in an intuitive and hands-free manner by using a Convolutional Neural Network (CNN) for hand gesture recognition in real-time. With programmed ...
详细信息
暂无评论