In this paper, we study coding artifacts in MPEG-compressed scalable audio. Specifically, we consider the MPEG advanced audio coder (AAC) using bit slice scalable arithmetic coding (BSAC) as implemented in the MPEG-4 ...
详细信息
In this paper, we study coding artifacts in MPEG-compressed scalable audio. Specifically, we consider the MPEG advanced audio coder (AAC) using bit slice scalable arithmetic coding (BSAC) as implemented in the MPEG-4 reference software. First we perform human subjective testing using the comparison category rating (CCR) approach, quantitatively comparing the performance of scalable BSAC with the nonscaled TwinVQ and AAC algorithms. This testing indicates that scalable BSAC performs very poorly relative to TwinVQ at the lowest bitrate considered (16 kb/s) largely because of an annoying and seemingly random mid-range tonal signal that is superimposed onto the desired output. In order to better understand and quantify the distortion introduced into compressed audio at low bit rates, we apply two analysis techniques: Reng bifrequency probing and time-frequency decomposition. Using Reng probing, we conclude that aliasing is most likely not the cause of the annoying tonal signal;instead, time-frequency or spectrogram analysis indicates that its cause is most likely suboptimal bit allocation. Finally, we describe the energy equalization quality metric (EEQM) for predicting the relative perceptual performance of the different coding algorithms and compare its predictive ability with that of ITU Recommendation ITU`-R BS.1387-1.
The paper addresses a bitstream scalable coder based on the MPEG-4 scalable lossless (SLS) coding system where, in contrast to SLS, the bitrate of the enhancement layer is not fixed but instead an attempt is made to c...
详细信息
The paper addresses a bitstream scalable coder based on the MPEG-4 scalable lossless (SLS) coding system where, in contrast to SLS, the bitrate of the enhancement layer is not fixed but instead an attempt is made to create a quality-fixed enhancement layer. With a PCM audio input, the proposed structure is able to produce an audio version with near-transparent quality on top of the existing low-quality version. In particular, the proposed fixed quality enhancing process with checking procedures is able to provide the minimum amount of enhancement for the low-quality version to obtain a near-transparent quality that is almost indistinguishable from the CD quality. In addition, a bitrate estimation model is proposed. The model enables the direct estimation of the enhancing bitrate from two parameters extracted from the encoding process of the low-quality version. Evaluation results indicate that a better defined quality level is guaranteed compared to a fixed bitrate setting and that in the mean a lower (approximately 20%) bitrate is attained. It is also shown that the estimation model proposed is able to accurately predict the necessary enhancing bitrate and at the same time, reduce the complexity by around 17%.
Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with th...
详细信息
Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.
There is a significant rise in demand for video transmission over 3G and 4G wireless networks due to the rising popularity of video streaming websites such as YouTube. The market for video streaming over wireless netw...
详细信息
There is a significant rise in demand for video transmission over 3G and 4G wireless networks due to the rising popularity of video streaming websites such as YouTube. The market for video streaming over wireless networks is expected to increase sharply in the future. Both of the two basic transport layer protocols without modifications are not suited for video transmission over wireless networks. UDP (user datagram protocol) suffers from inherent unreliability, resulting in corrupted video due to frequent corruption of packets. Inherent features of wireless networks such as noise, interference, etc. result in packet corruption. On the other hand, the performance of TCP (transmission control protocol) is worse than UDP (Thangaraj et al. in Telecommun Syst 45(4):303-312, 2010) because of frequently corrupted packets. Due to its reliable data transfer feature, TCP continuously retransmits the corrupted packet until successful reception at the receiver. This leads to jitter in video playback and poor end user quality of experience. Multiple TCP connections with appropriate optimization can lead to an increased efficiency of bandwidth utilization in comparison to single TCP based video transmission over wireless networks. It has been shown that multiple TCP connections enhance the video transmission and playback experience by providing reliable communication. The parallel TCP scheme proposed in this paper enhances the quality of video transmission and playback experience over MIMO wireless networks employing scalable hierarchical wavelet decomposition based video encoding with multiple TCP connections.
Lossless image and video compression is required in many professional applications. However, lossless coding results in a high data rate, which leads to a long wait for the user when the channel capacity is limited. T...
详细信息
ISBN:
(纸本)9781628412444
Lossless image and video compression is required in many professional applications. However, lossless coding results in a high data rate, which leads to a long wait for the user when the channel capacity is limited. To overcome this problem, scalable lossless coding is an elegant solution. It provides a fast accessible preview by a lossy compressed base layer, which can be refined to a lossless output when the enhancement layer is received. Therefore, this paper presents a lossy to lossless scalable coding system where the enhancement layer is coded by means of intra prediction and entropy coding. Several algorithms are evaluated for the prediction step in this paper. It turned out that Sample-based Weighted Prediction is a reasonable choice for usual consumer video sequences and the Median Edge Detection algorithm is better suited for medical content from computed tomography. For both types of sequences the efficiency may be further improved by the much more complex Edge-Directed Prediction algorithm. In the best case, in total only about 2.7% additional data rate has to be invested for scalable coding compared to single-layer JPEG-LS compression for usual consumer video sequences. For the case of the medical sequences scalable coding is even more efficient than JPEG-LS compression for certain values of QP.
The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image...
详细信息
ISBN:
(纸本)9781728113319
The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to perform image reconstruction with features and additional reference pixels, in which compact edge maps are extracted in this work to connect both kinds of vision in a scalable way. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as a sort of enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels. Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection, which provide useful evidence on the emerging standardization efforts on MPEG VCM (Video coding for Machine)(1).
A scalable audio coding method is proposed using a technique, Quantization Index Modulation, borrowed from watermarking. Some of the information of each layer output is embedded (watermarked) in the previous layer. Th...
详细信息
ISBN:
(纸本)9781479900152
A scalable audio coding method is proposed using a technique, Quantization Index Modulation, borrowed from watermarking. Some of the information of each layer output is embedded (watermarked) in the previous layer. This approach leads to a saving in bitrate while keeping the distortion almost unchanged. This makes the scalable coding system more efficient in terms of Rate-Distortion. The results show that the proposed method outperforms the scalable audio coding based on reconstruction error quantization which is used in practical systems such as MPEG-4 AAC.
This paper proposes a novel fractional compensation approach for spatial scalable video coding. It simultaneously exploits inter layer correlation and intra layer correlation by learning-based mapping. Instead of usin...
详细信息
ISBN:
(纸本)9781424442904
This paper proposes a novel fractional compensation approach for spatial scalable video coding. It simultaneously exploits inter layer correlation and intra layer correlation by learning-based mapping. Instead of using an enhancement layer reconstruction as an entire reference, a set of reference pairs are generated from high-frequency components of both base layer and enhancement layer reconstructions at previous frame. The reference set, which consists of low-resolution and high-resolution patches, can be generated in both encoder and decoder by on-line learning. During the encoding of enhancement layer, a prediction is first gotten from base layer, from which low-resolution patches are extracted. These patches are then used as indices to find the matched high-resolution patches from the reference set. Finally, the prediction enhanced by the high-resolution patches is used for coding. The proposed approach does not need any motion bits. With our proposed FC approach, the performance of H.264 SVC can be improved up to 2.4dB in spatial scalable coding.
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require di...
详细信息
ISBN:
(纸本)9781728163956
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require different semantic granularities of the image. Thus, the bitstream is designed to be scalable in the sense that progressive decoding of the bitstream provides coarse-to-fine semantic granularities. As a concrete example, we consider the task of coarse-grained and fine-grained image classification. We present a method to compress the multiple deep feature maps that are intermediate representations of an image passing a trained deep network. The deep-layer feature maps can serve for coarse-grained image classification while the shallow-layer feature maps can serve for fine-grained image classification. Experimental results demonstrate the feasibility of the proposed method, as well as the advantage of the semantically scalable coding paradigm.
This paper presents a new scalable speech codec for IP networks using the discrete wavelet transform (DWT). The scalable narrowband speech coding scheme based on the internet low bitrate codec (iLBC) was previously pr...
详细信息
ISBN:
(纸本)9781479934324
This paper presents a new scalable speech codec for IP networks using the discrete wavelet transform (DWT). The scalable narrowband speech coding scheme based on the internet low bitrate codec (iLBC) was previously presented and achieved speech quality equivalent to G.718 for narrowband signals. Whereas the performance of the core layer was satisfactory, the higher speech quality by the addition of the enhancement layer which employed the modified discrete cosine transform (MDCT) was desired. We propose the utilization of the DWT instead of the MDCT to encode the core-layer coding error in the enhancement layer. The experimental simulation results show that the DWT is a promising technique to use for encoding highly non-stationary signals such as the coding error.
暂无评论