Providing quality-constant streams can simultaneously guarantee user experience and prevent wasting bit-rate. In this paper, we propose a novel deep learning based two-pass encoder parameter prediction framework to de...
详细信息
ISBN:
(纸本)9781665475921
Providing quality-constant streams can simultaneously guarantee user experience and prevent wasting bit-rate. In this paper, we propose a novel deep learning based two-pass encoder parameter prediction framework to decide rate factor (RF), with which encoder can output streams with constant quality. In first-pass, an RF is predicted based on spatial-temporal and pre-coding features of video segment. Then video segment is encoded using the predicted RF and then its VMAF is measured. If first pass VMAF doesn't meet target quality, a second pass prediction is performed using another model, in where results of first pass is added to features. Experiments show the proposed method requires only 1.55 times encoding complexity on average, meanwhile the accuracy, that the compressed video's actual VMAF is within +/- 1 around the target VMAF, reaches 98.88%. Compared with average rate mode, this method can both improve visual quality and save similar to 10% bit-rate, as shown in demos(1).
As video dimensions - including resolution, frame rate, and bit depth - increase, a larger bitrate is required to maintain a higher Quality of Experience (QoE). While videos are often optimized for resolution and fram...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
As video dimensions - including resolution, frame rate, and bit depth - increase, a larger bitrate is required to maintain a higher Quality of Experience (QoE). While videos are often optimized for resolution and frame rate to improve compression and energy efficiency, the impact of color space is often overlooked. Larger color spaces are essential for avoiding color banding and delivering High Dynamic Range (HDR) content with richer, more accurate colors, although this comes at the cost of higher processing energy. This paper investigates the effects of bit depth and color subsampling on video compression efficiency and energy consumption. By analyzing different bit depths and subsampling schemes, we aim to determine optimized settings that balance compression efficiency with energy consumption, ultimately contributing to more sustainable and high-quality video delivery. We evaluate both encoding and decoding energy consumption and assess the quality of videos using various metrics including PSNR, VMAF, ColorvideoVDP, and CAMBI. Our findings offer valuable insights for video codec developers and content providers aiming to improve the performance and environmental footprint of their video streaming services.
Emotion recognition is a crucial problem in affective computing. Most of previous works utilized facial expression from visible spectrum data to solve emotion recognition task. Thermal videos provide temperature measu...
详细信息
ISBN:
(纸本)9781728180687
Emotion recognition is a crucial problem in affective computing. Most of previous works utilized facial expression from visible spectrum data to solve emotion recognition task. Thermal videos provide temperature measurement of human body over time, which can be used to recognize affective states by learning its temporal pattern. In this paper, we conduct comparative experiments to study the effectiveness of the existing deep neural networks when applied to emotion recognition task from thermal video. We analyze the effect of various approaches for frame sampling in video, temporal aggregation between frames, and different convolutional neural network architectures. To the best of our knowledge, we are the first work to conduct study on emotion recognition from thermal video based on deep neural networks. Our work can provide preliminary study to design new methods for emotion recognition in thermal domain.
A key constraint in mobile communications is the reliance on a battery with a limited energy supply. Efficiently utilizing the available energy is therefore an important design consideration. In this paper we consider...
详细信息
ISBN:
(纸本)0780367251
A key constraint in mobile communications is the reliance on a battery with a limited energy supply. Efficiently utilizing the available energy is therefore an important design consideration. In this paper we consider a situation where a video sequence is to be compressed and transmitted over a wireless channel. The goal is to limit the amount of distortion in the received video sequence while using the minimum required transmission energy. To accomplish this goal we consider error resilience and concealment techniques, at the source coding level, as well as the dynamic allocation of physical layer communication resources. We jointly consider these approaches in a novel framework. In this setting we formulate an optimization problem that corresponds to minimizing the energy required to transmit a video frame with an acceptable level of distortion. We present methods for solving this problem and other extensions.
Automod, the content moderation system, is an artificial intelligence solution that enables the detection of similarities and inconsistencies in visual content (image, video, etc.). It is designed as a content moderat...
详细信息
ISBN:
(纸本)9798350343557
Automod, the content moderation system, is an artificial intelligence solution that enables the detection of similarities and inconsistencies in visual content (image, video, etc.). It is designed as a content moderation system to detect the similarity and inconsistencies of user-generated visual content (images and videos). With the similarity module installed, labor savings of 15% were achieved, and F1 score results of 90% and higher were achieved for nonconformity detection models. More than 100.000 images can be evaluated daily, and the system's load was tested. Similarly, keyframes obtained from at least 65.000 video content that can be evaluated daily were passed through nonconformity models, and load test was applied.
This paper presents an adaptive resolution change (ARC) method adopted in versatile video coding (VVC) to adapt the video bit-stream transmission to dynamic network environments. This approach enables resolution chang...
详细信息
ISBN:
(纸本)9781728180687
This paper presents an adaptive resolution change (ARC) method adopted in versatile video coding (VVC) to adapt the video bit-stream transmission to dynamic network environments. This approach enables resolution changes within a video sequence at any frame without the insertion of an instantaneous decoder refresh (IDR) or intra random access picture (IRAP). The underlying techniques include reference picture resampling and handling of interactions between the existing coding tools and the changes in resolution. In addition to the techniques adopted in VVC, this paper proposes two techniques for temporal motion vector prediction and deblocking filter to further improve both subjective and objective quality. The experimental results show that the combined ARC method can prevent the burden on bit cost exerted by the insertion of an intra frame during resolution changes. At the same time, 18%, 21% and 21% BD-rate reductions are achieved for Y, Cb, and Cr components, respectively.
Reducing the huge computational complexity of intra mode decision is the key to real-time video Coding (VVC). This paper proposes a fast intra mode decision scheme that takes advantage of lightweight machine learning ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Reducing the huge computational complexity of intra mode decision is the key to real-time video Coding (VVC). This paper proposes a fast intra mode decision scheme that takes advantage of lightweight machine learning (ML) models to classify intra modes into fifteen clusters. The cluster is further refined using one of the three proposed strategies to select the most optimal mode. Our experimental results with the fastest configuration of the practical uvg266 encoder show that the proposed methods yield a competitive rate-distortion-complexity trade-off over a conventional rough mode decision (RMD). To the best of our knowledge, this is the first work to successfully reduce the complexity of RMD in a practical VVC encoder with the use of ML techniques.
In this paper, we propose a no-reference stereoscopic video quality assessment (NR-SVQA) based on human vision system (HVS). Firstly, we build a frequency transform module (FTM), which maps spatial domain to frequency...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a no-reference stereoscopic video quality assessment (NR-SVQA) based on human vision system (HVS). Firstly, we build a frequency transform module (FTM), which maps spatial domain to frequency domain by cosine discrete transform (DCT), and selects important frequency components through channel attention mechanism. Secondly, we use dynamic convolution to regionally process the same input. Thirdly, we use convolutional long short term memory (Conv-LSTM) to extract spatio-temporal information rather than just temporal information. Finally, in order to better simulate the visual characteristics of human eyes, we build a optic chiasm module. The experiment results show that our method outperforms any other methods.
High Efficiency video Coding - Screen Content Coding (HEVC-SCC) follows the traditional angular intra prediction technique in HEVC. However, the Planar mode and the DC mode are somewhat repetitive for screen content v...
详细信息
ISBN:
(纸本)9781728185514
High Efficiency video Coding - Screen Content Coding (HEVC-SCC) follows the traditional angular intra prediction technique in HEVC. However, the Planar mode and the DC mode are somewhat repetitive for screen content video with features such as no senor noise. Hence, this paper proposes a new intra prediction mode called linear regression (LR) mode, which combines the Planar mode and the DC mode into one mode. The LR mode improves the prediction accuracy of intra prediction for fading regions in screen content video. Besides, by optimizing the most probable mode (MPM) construction, the hit rate of the best mode in the MPM list is improved. The experimental results show that the proposed method can achieve 0.57% BD-BR reduction compared with HM 16.20+SCM8.8, while the coding time remains largely the same.
Stereoscopic video quality assessment (SVQA) is of great importance to promote the development of the stereoscopic video industry. In this paper, we propose a three-branch multi-level binocular fusion convolutional ne...
详细信息
ISBN:
(纸本)9781728185514
Stereoscopic video quality assessment (SVQA) is of great importance to promote the development of the stereoscopic video industry. In this paper, we propose a three-branch multi-level binocular fusion convolutional neural network (MBFNet) which is highly consistent with human visual perception. Our network mainly includes three innovative structures. Firstly, we construct a multi-scale cross-dimension attention module (MSCAM) on the left and right branches to capture more critical semantic information. Then, we design a multi-level binocular fusion unit (MBFU) to fuse the features from left and right branches adaptively. Besides, a disparity compensation branch (DCB) containing an enhancement unit (EU) is added to provide disparity feature. The experimental results show that the proposed method is superior to other existing SVQA methods with state-of-the-art performance.
暂无评论