video surveillance requires simultaneous monitoring of multiple areas. Consequently, real-time automatic change detection of the monitored areas becomes very important. In the context of wide field-of-view conditions,...
详细信息
We apply reinforcement learning to video compressive sensing to adapt the compression ratio. Specifically, video snapshot compressive imaging (SCI), which captures high-speed video using a low-speed camera is consider...
详细信息
We apply reinforcement learning to video compressive sensing to adapt the compression ratio. Specifically, video snapshot compressive imaging (SCI), which captures high-speed video using a low-speed camera is considered in this work, in which multiple (B) video frames can be reconstructed from a snapshot measurement. One research gap in previous studies is how to adapt B in the video SCI system for different scenes. In this article, we fill this gap utilizing reinforcement learning (RL). An RL model, as well as various convolutional neural networks for reconstruction, are learned to achieve adaptive sensing of video SCI systems. Furthermore, the performance of an object detection network using directly the video SCI measurements without reconstruction is also used to perform RL-based adaptive video compressive sensing. Our proposed adaptive SCI method can thus be implemented in low cost and realtime. Our work takes the technology one step further towards real applications of video SCI.
We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information.
In this paper, the 3D space imaging model of machine vision is constructed. Starting from the traditional machine vision imageprocessing algorithm flow, the image denoising process and target tracking process are opt...
详细信息
The highly dynamic nature of mobile networks makes it difficult to guarantee the real-time and stability of UAV video transmission, which greatly affects the user's Quality of Experience (QoE). Adaptive Bitrate (A...
详细信息
real-timevideo and imageprocessing are used in various industrial, medical, consumer electronics and embedded device applications. These applications typically demonstrate an increasing demand for computing power an...
详细信息
ISBN:
(纸本)9783031585012;9783031585029
real-timevideo and imageprocessing are used in various industrial, medical, consumer electronics and embedded device applications. These applications typically demonstrate an increasing demand for computing power and system complexity. Hence, edge detection is the most common and widely used technique in image or videoprocessing applications. Several traditional canny edge detection methods use fixed thresholding techniques to compare the pixel values. This sacrifices the edge detection performance and increases the computational complexity. Hence, the Canny Edge detection algorithm is preferred to enhance the image quality with reduced complexity. They adjust the quality of the image by manipulating the Sigma and Threshold parameters and detect the edges accurately by eliminating the noise. The reconfigurable canny edge detection algorithm presents a procedure for detecting edges without multipliers. The new algorithm uses a low-complex, non-uniform histogram gradient to compute thresholds and variable sigma values that replace the add and shift operator instead of multipliers to reduce the area and sigma. The simulation is done in the ModelSim platform using VHDL code which results in the output of bit sequences. By comparing the results of the reconfigurable canny edge detection and traditional algorithm, the new algorithm's performance can be observed with improvements of around 21% and 80% for consumed power and delay parameters respectively.
Monocular depth estimation algorithms aim to explore the possible links between 2D and 3D data, but challenges remain for existing methods to predict consistent depth from a casual video. Relying on camera poses and t...
详细信息
ISBN:
(纸本)9798400701788
Monocular depth estimation algorithms aim to explore the possible links between 2D and 3D data, but challenges remain for existing methods to predict consistent depth from a casual video. Relying on camera poses and the optical flow in the time-consuming testtime training phases makes these methods fail in many scenarios and cannot be used for practical applications. In this work, we present a data-driven post-processing method to overcome these challenges and achieve online processing. Based on a deep recurrent network, our method takes the adjacent original and optimized depth map as inputs to learn temporal consistency from the dataset and achieves higher depth accuracy. Our approach can be applied to multiple single-frame depth estimation models and used for various real-world scenes in real-time. In addition, to tackle the lack of a temporally consistent video depth training dataset of dynamic scenes, we propose an approach to generate the training video sequences dataset from a single image based on inferring motion field. To the best of our knowledge, this is the first datadriven plug-and-play method to improve the temporal consistency of depth estimation for casual videos. Extensive experiments on three datasets and three depth estimation models show that our method outperforms the state-of-the-art methods.
The article proposes an algorithm for processing parallel analysis of visual data obtained by a machine vision system, recorded information in the human visible spectrum, and information received by a range camera. An...
详细信息
ISBN:
(数字)9781510661714
ISBN:
(纸本)9781510661707;9781510661714
The article proposes an algorithm for processing parallel analysis of visual data obtained by a machine vision system, recorded information in the human visible spectrum, and information received by a range camera. An algorithm for the formation of stable features as elements of the human body, head and pupils of a person and parallel tracking of their increment is proposed. To highlight trend lines in element displacement and eliminate the high frequency component based on a combined criterion. The image is preliminarily processed to reduce the effect of the noise component based on a multi-criteria objective function. As test data used to evaluate the effectiveness, a video stream with a resolution of 1024x768 (8-bit, color image, visible range), 3D data, and expert evaluation data are used.
This research addresses urban parking challenges by allowing users to reserve parking spaces via a mobile app. The system integrates automated barriers and AI-powered cameras for accurate license plate recognition, en...
详细信息
Ships and other maritime objects are often unable to endure the harsh and dynamic sea environment. Collecting real-time data and detecting these objects using various sensors such as RADARs, Synthetic Aperture RADARs,...
详细信息
Ships and other maritime objects are often unable to endure the harsh and dynamic sea environment. Collecting real-time data and detecting these objects using various sensors such as RADARs, Synthetic Aperture RADARs, and mounted RADARs present significant challenges due to numerous influencing factors. To address this issue, our research aims to develop an Internet of Things (IoT)-based multi-scale and multi-scene ship identification system. This system leverages a multi-scale neural network integrated with a high-response convolutional neural network (CNN)-based Kalman filter architecture. To construct this model, we selected various ship categories and initially employed a base CNN model to develop a new model with different convolutional layers. Our approach utilizes mixed methods for tracking and detecting objects, with a focus on small ships. The dataset is processed through multiple neural network layers, and we implemented the Kalman filter to estimate and predict the ships' positions. Additionally, using the YOLOv3 model, we achieved improved accuracy and reduced error rates through mathematical optimization. Our method utilizes a dataset of 5,604 samples and incorporates a hybrid approach with YOLOv3. Our model demonstrates significant improvements for both medium-sized and small ships. The proposed work provides both qualitative and quantitative advancements. Our model exceeded the best results from parallel experiments by 3.9% and 1.2% in terms of Average Precision (AP). Furthermore, YOLOv3 achieved a performance score of 97.34% across various metric parameters, while our proposed approach attained the highest scores of 97.8% and 94.87%, respectively.
暂无评论