In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Therefore, in July 2020, the versatile video coding...
详细信息
In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Therefore, in July 2020, the versatile video coding (VVC) standard was released, providing up to 50% bit-rate savings for the same video quality compared to its predecessor high-efficiency video coding (HEVC). However, these bit-rate savings come at the cost of high computational complexity, particularly for live applications and on resource-constrained embedded devices. This paper evaluates two optimized VVC software decoders, named OpenVVC and Versatile video deCoder (VVdeC), designed for low resources platforms. These decoders exploit optimization techniques such as data-level parallelism using single instruction multiple data (SIMD) instructions and functional-level parallelism using frame, tile, and slice-based parallelisms. Furthermore, a comparison of decoding runtime, energy, and memory consumption between the two decoders is presented while targeting two different resource-constraint embedded devices. The results showed that both decoders achieve real-time decoding of full high-definition (FHD) resolution on the first platform using 8 cores and high-definition (HD) real-time decoding for the second platform using only 4 cores with comparable results in terms of the average energy consumed: around 26 J and 15 J for the 8 cores and 4 cores platforms, respectively. Furthermore, OpenVVC showed better results regarding memory usage with a lower average maximum memory consumed during runtime than VVdeC.
Most of the conventional power system remote video monitoring systems are designed based on the SIP principle. In the actual monitoring operation process, there are problems such as poor real-time monitoring and high ...
详细信息
Dance style recognition through video analysis during university training can significantly benefit both instructors and novice dancers. Employing video analysis in training offers substantial advantages, including th...
详细信息
Dance style recognition through video analysis during university training can significantly benefit both instructors and novice dancers. Employing video analysis in training offers substantial advantages, including the potential to train future dancers using innovative technologies. Over time, intricate dance gestures can be honed, reducing the burden on instructors who would, otherwise, need to provide repetitive demonstrations. Recognizing dancers' movements, evaluating and adjusting their gestures, and extracting cognitive functions for efficient evaluation and classification are pivotal aspects of our model. Deep learning currently stands as one of the most effective approaches for achieving these objectives, particularly with short video clips. However, limited research has focused on automated analysis of dance videos for training purposes and assisting instructors. In addition, assessing the quality and accuracy of performance video recordings presents a complex challenge, especially when judges cannot fully focus on the on-stage performance. This paper proposes an alternative to manual evaluation through a video-based approach for dance assessment. By utilizing short video clips, we conduct dance analysis employing techniques such as fine-grained dance style classification in video frames, convolutional neural networks (CNNs) with channel attention mechanisms (CAMs), and autoencoders (AEs). These methods enable accurate evaluation and data gathering, leading to precise conclusions. Furthermore, utilizing cloud space for real-timeprocessing of video frames is essential for timely analysis of dance styles, enhancing the efficiency of information processing. Experimental results demonstrate the effectiveness of our evaluation method in terms of accuracy and F1-score calculation, with accuracy exceeding 97.24% and the F1-score reaching 97.30%. These findings corroborate the efficacy and precision of our approach in dance evaluation analysis.
video action recognition, as one of the fundamental tasks in video understanding, relies crucially on accurate temporal modeling. However, accurately modeling the temporal information of videos remains a challenging t...
详细信息
video action recognition, as one of the fundamental tasks in video understanding, relies crucially on accurate temporal modeling. However, accurately modeling the temporal information of videos remains a challenging task. To address this problem, we design two new modules: the Spatial Motion Extraction (SME) module and the Spatio-temporal Motion Excitation (STME) module. The SME module features two branches for extracting motion and spatial features. The motion branch refines pixel differences between neighboring frames through a channel attention module, enhancing detailed motion features. These features are fused with spatial information to yield fine-grained local spatio-temporal features. The STME module, comprising the multi-motion excitation (MME), temporal excitation (TE), and spatio-temporal excitation (STE) sub-modules, efficiently captures long-range motion, temporal, and global spatio-temporal features. The MME introduces a bi-directional, multi-scale structure for effective long-range motion extraction, while the TE module employs a hierarchical pyramid with residual connectivity for fine-grained long-range temporal extraction. The STE module utilizes 3D convolutional layers for global spatio-temporal feature extraction. The seamless integration of these sub-modules within a standard ResNet network forms the Spatio-temporal Motion Excitation Network. Extensive evaluations on Something V1 and V2 and HMDB51 datasets against state-of-the-art methods demonstrate the effectiveness of our approach in achieving accurate recognition of both simple and complex video actions.
As violent criminals, such as child sex offenders, tend to have high recidivism rates in modern society, there is a need to prevent such offenders from approaching socially disadvantaged and crime-prone areas, such as...
详细信息
As violent criminals, such as child sex offenders, tend to have high recidivism rates in modern society, there is a need to prevent such offenders from approaching socially disadvantaged and crime-prone areas, such as schools or childcare centers. Accordingly, national governments and related institutions have installed surveillance cameras and provided additional personnel to manage and monitor them via video surveillance equipment. However, naked-eye monitoring by guards and manual imageprocessing cannot properly evaluate the video captured by surveillance cameras. To address the various problems of conventional systems that simply store and retrieve image data, a system is needed that can actively classify captured images in real-time, in addition to assisting surveillance personnel. Therefore, this paper proposes a video surveillance system based on a composable deep face recognition method. The proposed system detects the faces of criminals in realtime from videos captured by a surveillance camera and notifies relevant institutions of the appearance of criminals. For real-time face detection, a down-sampled image forked from the original is used to localize unspecified faces. To improve accuracy and confidence in the recognition task, a scoring method based on face tracking is proposed. The final score combines the recognition confidence and the standard score to determine the embedding distance from the criminal face embedding data. The blind spots of surveillance personnel can be effectively addressed through early detection of criminals approaching crime-prone areas. The contributions of the paper are as follows. The proposed system can process images from surveillance cameras in real-time by using down-sampling. It can effectively identify the identity of criminals by using a face tracking ID unit and minimizes prediction reversal by solving the congested embedding problem in the feature space that may occur when performing identification matching on a la
The detection of shot boundaries (hardcuts and short dissolves), sampling structure (progressive / interlaced / pulldown) and dynamic keyframes in a video are fundamental video analysis tasks which have to be done bef...
详细信息
ISBN:
(纸本)9798350367164;9798350367157
The detection of shot boundaries (hardcuts and short dissolves), sampling structure (progressive / interlaced / pulldown) and dynamic keyframes in a video are fundamental video analysis tasks which have to be done before any further high-level analysis tasks. We present a novel algorithm which does all these analysis tasks in an unified way, by utilizing a combination of inter-frame and intra-frame measures derived from the motion field and normalized cross correlation. The algorithm runs four times faster than real-time due to sparse and selective calculation of these measures. An initial evaluation furthermore shows that the proposed algorithm is extremely robust even for challenging content showing large camera or object motion, flashlights, flicker or low contrast / noise.
作者:
Wang, XingwangShen, MuziYang, KunJilin Univ
Coll Comp Sci & Technol Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Peoples R China Jilin Univ
Software Coll Changchun 130012 Peoples R China Nanjing Univ
Sch Intelligent Software & Engn Suzhou 210093 Jiangsu Peoples R China Univ Essex
Sch Comp Sci & Elect Engn Colchester CO4 3SQ England
Performing video analytics tasks based on deep neural networks (DNNs) on resource-constrained mobile devices is extremely challenging because of the huge volume of video data and the computationally intensive nature o...
详细信息
Performing video analytics tasks based on deep neural networks (DNNs) on resource-constrained mobile devices is extremely challenging because of the huge volume of video data and the computationally intensive nature of DNN. One promising solution is to offload tasks to the edge servers for execution. However, due to explosive growth in the number of end devices, more and more mobile devices are connected to the edge servers. This makes it difficult for the edge server to meet the specific service level objective (SLO) of on-edge video analytics when facing concurrent computing requests, especially in the real-time scene. To address this issue, this article presents EHCI, an on-edge high-throughput collaborative inference framework for real-timevideo analytics. On the mobile device, EHCI crops the key regions from the current video frame based on the local detection cache and offloads these regions to the edge server, which can significantly reduce bandwidth consumption and computation costs. Besides, considering concurrent DNN inference requests from multiple mobile devices, EHCI uses a key region patching method to achieve high-throughput DNN inference on the edge server, along with a scheduling algorithm to meet the SLO for each mobile device. It has been validated with testing that the EHCI outperforms the state-of-the-art technology by 159% in achieved throughput, reduces the average end-to-end delay by 36%, and the application accuracy sacrifice is within a reasonable range.
Learned wavelet image and video coding approaches provide an explainable framework with a latent space corresponding to a wavelet decomposition. The wavelet image coder iWave++ achieves state-of-the-art performance an...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Learned wavelet image and video coding approaches provide an explainable framework with a latent space corresponding to a wavelet decomposition. The wavelet image coder iWave++ achieves state-of-the-art performance and has been employed for various compression tasks, including lossy as well as lossless image, video, and medical data compression. However, the approaches suffer from slow decoding speed due to the autoregressive context model used in iWave++. In this paper, we show how a parallelized context model can be integrated into the iWave++ framework. Our experimental results demonstrate a speedup factor of over 350 and 240 for image and video compression, respectively. At the same time, the rate-distortion performance in terms of Bjontegaard delta bitrate is slightly worse by 1.5% for image coding and 1% for video coding. In addition, we analyze the learned wavelet decomposition by visualizing its subband impulse responses.
This effort aims to create a hardware resource-efficient real-timevideoprocessing system employing Polar Fire FPGA technology. This paper presents the interface between two IMX 334 camera modules and a Polar-Fire FP...
详细信息
real-time teleoperation of robotic systems over the Internet is a desirable technology in many ways. Latency of the video feedback has been hampering its development. This paper takes the application of remote driving...
详细信息
ISBN:
(纸本)9798350355376;9798350355369
real-time teleoperation of robotic systems over the Internet is a desirable technology in many ways. Latency of the video feedback has been hampering its development. This paper takes the application of remote driving to introduce an unconventional codec that provides a very low latency for Internet-based video streaming. The proposed method preserves just enough information in the video for essential perception and decision-making of a remote driver. Thanks to a unique integration of several imageprocessing and data streaming techniques, the proposed codec can realize a glass-to-glass latency of around 90ms. A series of tests are conducted over the real consumer Internet to analyze the latency and verify the effectiveness of remote driving with the proposed codec.
暂无评论