One of the most important new tools of the Versatile video Coding (VVC) standard is the Affine Motion Estimation (AME). The AME contribution to the coding efficiency comes with a high computational cost, especially fo...
详细信息
This paper presents an evaluation of the Coarse-to-Fine Spatio-Temporal Information Fusion (CF-STIF) network for enhancing the quality of compressed videos across multiple codecs, including HEVC, VVC, VP9, and AV1. Th...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
This paper presents an evaluation of the Coarse-to-Fine Spatio-Temporal Information Fusion (CF-STIF) network for enhancing the quality of compressed videos across multiple codecs, including HEVC, VVC, VP9, and AV1. The CF-STIF network leverages spatiotemporal fusion and deep learning techniques to reduce compression artifacts and improve video quality. The evaluation extends existing methods by employing multiple quality metrics such as PSNR, SSIM, LPIPS. The CF-STIF network has been integrated with the Spatio-Temporal Deformable Fusion (STDF) training scheme in order to execute the model. Results demonstrate that CF-STIF achieves the highest quality improvements for HEVC-encoded videos, with an average PSNR increase of 0.813 dB and superior visual quality as measured by SSIM. However, the performance significantly drops for other codecs, particularly AV1, highlighting the need for future adaptations to optimize CF-STIF for diverse compression standards.
The growing demand for high-definition online videos emphasizes the need for efficient video codecs like H.266/VVC, which offer significant compression potential. However, its implementation presents challenges, parti...
详细信息
The growing demand for high-definition online videos emphasizes the need for efficient video codecs like H.266/VVC, which offer significant compression potential. However, its implementation presents challenges, parti...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
The growing demand for high-definition online videos emphasizes the need for efficient video codecs like H.266/VVC, which offer significant compression potential. However, its implementation presents challenges, particularly in terms of computational cost, as is the case of the Multiple Transform Selection (MTS) tool. This study analyzes the performance of MTS modes, showing that explicit MTS improves coding efficiency but increases encoding time, while implicit MTS offers modest efficiency gains with less computational cost. A machine learning-based approach is proposed, using decision trees, to accelerate encoder decisions for both intra and inter predicted blocks in explicit MTS, reducing encoding time by an average of 7.98%, with only a 0.89% increase in BD-rate. These results highlight the potential for optimization of explicit MTS in both intra and inter transformations.
AV1 is a codec developed by huge technology companies to be used in current and future commercial video applications. It introduces and improves several tools from its predecessor VP9, designed for various video scena...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
AV1 is a codec developed by huge technology companies to be used in current and future commercial video applications. It introduces and improves several tools from its predecessor VP9, designed for various video scenarios. One of the improved tools is the Fractional Motion Estimation (FME), which generates sub-pixel predictors. AV1 employs four sets of interpolation filters, requiring significant computational effort during the Interpolation Filter Search (IFS) to identify the best filter to be used. This work proposes the FIFS, a machinelearning method developed to reduce the processing time of IFS while maintaining minimal impact on coding efficiency. This method achieves over 52% reductions in IFS time, with only a slight increase in BD-BR of $\mathbf{0.14 \%}$ . To the best of the authors' knowledge, this is the first work in the literature to propose a machine learning-based approach for the AV1 IFS.
As the demand for video transmission surges on remote work, education, and streaming services, the need for continuous advancements in video encoding technologies becomes increasingly evident. Adapting to the evolving...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
As the demand for video transmission surges on remote work, education, and streaming services, the need for continuous advancements in video encoding technologies becomes increasingly evident. Adapting to the evolving requirements of efficient video delivery and consumption necessitates ongoing development and enhancement in video encoding standards, with Versatile video Coding (VVC) emerging as a notable example. This paper provides an overview of key algorithms within InterFrame prediction of VVC, mainly focusing on the Test Zone Search (TZS) and the Affine Motion Estimation (AME), two of the most computationally intensive tools inside the VVC. Furthermore, this paper introduces a fast TZS and AME approach using Machine Learning, specifically employing Decision Trees. The proposed approach achieved an average reduction of over $\mathbf{2 0 \%}$ in total VVC encoding time while maintaining less than a 1 % impact on BD-BR coding efficiency.
The popularization of mobile phones and other multimedia portable devices paved the way for the increase in video consumption worldwide. However, it is impossible to transmit a non-compressed video due to the high ban...
The popularization of mobile phones and other multimedia portable devices paved the way for the increase in video consumption worldwide. However, it is impossible to transmit a non-compressed video due to the high bandwidth required. To achieve significant compression rates, video codecs usually employ methods that damage the visual quality perceived by the end user in non-negligible levels. Different architectures based on deep learning have been recently proposed for video Quality Enhancement (VQE). Still, most of them are trained and validated using videos generated by a single codec under fixed configurations. With the increase of video coding formats and standards on the market, VQE methods that apply to different contexts are desired. This paper proposes a new VQE model based on the Spatio- Temporal Deformable Fusion (STDF) archi-tecture, providing quality gains for videos compressed according to different formats and standards, such as HEVC, VVC, VP9, and AVI. The results demonstrate that by considering different video coding standards and formats to build the STDF model, a significant increase in VQE is achieved, with an average PSNR increment of up to 0.382 dB.
Processing and storing the 4D structure of light fields can be challenging and expensive due to the high-dimensional data and its unique characteristics. There are plenty of works employing convolutional neural networ...
Processing and storing the 4D structure of light fields can be challenging and expensive due to the high-dimensional data and its unique characteristics. There are plenty of works employing convolutional neural networks (CNNs) for light field prediction and encoding. Nonetheless, to the best of our knowledge, the literature lacks an efficiency evaluation of different CNN architectures as well as 4D neural networks for these purposes. Therefore, this paper presents an experimental study that assesses the performance of pipeline and U-net convolutional neural networks for light field block prediction in both spatial and angular dimensions. Additionally, we compare these architectures with a novel 4D network that aims to exploit the light field data structure. The results of the study show that U-net and 4D networks outperform classical CNN architectures in terms of prediction accuracy and residue generation. Furthermore, the spatial dimension prediction provides more valuable information for the networks to learn, improving their prediction by 5dB.
Lossy video compression introduces visual artifacts that degrade video quality, where deep neural networks (DNNs) are effective in enhancement. However, conventional DNN-based methods often focus on a single video com...
详细信息
Lossy video compression introduces visual artifacts that degrade video quality, where deep neural networks (DNNs) are effective in enhancement. However, conventional DNN-based methods often focus on a single video com...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
Lossy video compression introduces visual artifacts that degrade video quality, where deep neural networks (DNNs) are effective in enhancement. However, conventional DNN-based methods often focus on a single video compression standard, limiting their deployment in multiple cases. To overcome this issue, this study introduces a multi-domain video quality enhancement architecture based on the Spatio-Temporal Deformable Fusion (STDF) technique. This method enables the model to enhance videos compressed with multiple codecs, maintaining reliable performance across standards. After trained, the proposed architecture was tested with videos compressed by the High Efficiency video Coding (HEVC) encoder, the Versatile video Coding (VVC) encoder, the VP9 codec and the AOMedia video 1 (AV1) codec. Results show an average Peak Signal-to-Noise Ratio (PSNR) improvement between 0.228 dB and 0.787 dB.
暂无评论