An algorithm for high-quality coding of 48 kHz sampled audio signals is presented. The algorithm employs a perceptual transform and a variable-depth multistage quantizer. The resulting audio reproduction quality is be...
详细信息
An algorithm for high-quality coding of 48 kHz sampled audio signals is presented. The algorithm employs a perceptual transform and a variable-depth multistage quantizer. The resulting audio reproduction quality is better than that of the Motion Pictures Expert Group (MPEG) layer I coder and roughly equivalent to that of the MPEG layer II coder.
The modified discrete cosine transform (MDCT) is always employed in transform-coding schemes as the analysis/ synthesis filter bank. In this paper, an efficient algorithm for MDCT and inverse MDCT (IMDCT) computation ...
详细信息
The modified discrete cosine transform (MDCT) is always employed in transform-coding schemes as the analysis/ synthesis filter bank. In this paper, an efficient algorithm for MDCT and inverse MDCT (IMDCT) computation for MPEG-1 audio layer III and MPEG-2 international audio-coding standards is proposed, using only the type-II DCT. Finally, the proposed algorithm is compared to the similar algorithms in this paper. (C) 2005 Elsevier B.V. All rights reserved.
Delivering high-quality spatial audio in the Ambisonics format requires extensive data bandwidth, which may render it inaccessible for many low-bandwidth applications. Existing widely-available multi-channel audio com...
详细信息
Delivering high-quality spatial audio in the Ambisonics format requires extensive data bandwidth, which may render it inaccessible for many low-bandwidth applications. Existing widely-available multi-channel audio compression codecs are not designed to consider the characteristic inter-channel relations inherent to the Ambisonics format, and thus may not leverage this knowledge to optimise the compression. Therefore, this article proposes a spatial audio compression algorithm, based on a novel reformulation of the Higher-Order Directional audio coding (HO-DirAC) method, which is specifically intended for compressing higher-order Ambisonic audio streams. The methodology builds upon the concept of a spherical filter bank acting in the spherical harmonic domain. This results in directionally constrained sound-field estimates and parameterization, which may be utilized to reconstruct the input Ambisonic signals with minimal perceived loss of quality. The results of a listening experiment indicate high perceptual quality when using six or more audio transport channels to deliver fifth-order (36 channels) Ambisonic sound scenes. The proposed formulation is also designed with low computational complexity in mind and may therefore be well suited for compressing Ambisonic sound scenes for a wide range of applications.
The MPEG-1 audio standard (ISO/IEC 11172-3) establishes guidelines for the compression of high-quality digital audio signals [I]. The standard dictates the function of an encoder/decoder pair (codec), leaving form int...
详细信息
The MPEG-1 audio standard (ISO/IEC 11172-3) establishes guidelines for the compression of high-quality digital audio signals [I]. The standard dictates the function of an encoder/decoder pair (codec), leaving form intentionally vague to allow for competing implementations. A typical approach to real-time operation is to design an application-specific integrated circuit (ASIC) dedicated to encoding, decoding [2], or both. We present an alternative codec that makes use of the general-purpose digital signal processing (DSP) chips that are now common in multimedia-capable workstations and personal computers. We discuss how selective optimization of codec structure allows robust performance using limited resources, highlight some of the problems inherent in translating the abstractions of the standard into assembly code, and point towards further investigations of real-time implementations of communications standards.
DRA(Dynamic resolution adaptation)audio coding standard was shown to deploy transientlocalized MDCT to effectively suppress pre-echo artifacts and statistic allocation of codebooks to improve the compression efficienc...
详细信息
DRA(Dynamic resolution adaptation)audio coding standard was shown to deploy transientlocalized MDCT to effectively suppress pre-echo artifacts and statistic allocation of codebooks to improve the compression efficiency of Huffman coding. Its quantizers and Huffman codebooks are designed in such a way that a signal path of 24 bits is provided throughout the codec so that high audio quality can be delivered if bit rate *** simple, it delivers state-of-the-arts compression efficiency as shown by five rounds of ITU-R BS.11116 compliant subjective listening tests.
For decades, linear predictive (LP) analysis of the real-valued time-domain signals has been developed for speech coding, analysis, and synthesis. Recently, the complex-valued frequency-domain LP coding was developed ...
详细信息
For decades, linear predictive (LP) analysis of the real-valued time-domain signals has been developed for speech coding, analysis, and synthesis. Recently, the complex-valued frequency-domain LP coding was developed to enhance the estimation performance of the temporal envelope. To apply it for audio coding, a suitable representation for the complex-valued frequency-domain LP coefficients (CLPC) is required before quantization, but there was no efficient way to represent it. To address the problem, we propose efficient CLPC representations that retain some useful properties of conventional LPC representations. Through quantitative and qualitative evaluations, we demonstrate that our proposed representations increase quantization efficiency and improve audio coding performance.
This paper investigates the use of sparse overcomplete decompositions for audio coding. audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different sca...
详细信息
This paper investigates the use of sparse overcomplete decompositions for audio coding. audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different scales. This approach produces a sparser decomposition than the traditional MDCT-based orthogonal transform and allows better coding efficiency at low bitrates. Contrary to state-of-the-art low bitrate coders, which are based on pure parametric or hybrid representations, our approach is able to provide transparency. Moreover, we use a bitplane encoding approach, which provides a fine-grain scalable coder that can seamlessly operate from very low bitrates up to transparency. Objective evaluation, as well as listening tests, show that the performance of our coder is significantly better than a state-of-the-art transform coder at very low bitrates and has similar performance at high bitrates. We provide a link to test soundfiles and source code to allow better evaluation and reproducibility of the results.
Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a ...
详细信息
Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion delectability defines a (perceptually relevant) norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.
In this paper, we present a novel audio coder using the discrete wavelet transform (DWT) and warped linear prediction (WLP). In contrast to conventional LP, WLP allows for the control of frequency resolution to closel...
详细信息
In this paper, we present a novel audio coder using the discrete wavelet transform (DWT) and warped linear prediction (WLP). In contrast to conventional LP, WLP allows for the control of frequency resolution to closely match the response of the human auditory system. The structure of the system is similar to the transform coded excitation techniques used in wideband speech coding, where LP has been replaced with WLP, and the residual is analyzed by a wavelet filterbank designed to approximate the critical bands. The inherent shaping of the WLP synthesis filter, and a controlled bit allocation to the wavelet coefficients helps minimise the perceptually significant noise due to the quantization error in the residual. For monophonic signals sampled at 44.1 kHz, the coder achieves near transparent to transparent quality for a variety of speech and music signals at an average bitrate of about 64 kb/s. Tests also show that the coder (in its initial implementation) delivers superior quality to the MPEG layer III and comparable quality to the MPEG2-AAC codec when operating at the same bitrate.
This study presents a novel spatial audio coding (SAC) technique, called analysis by synthesis SAC (AbS-SAC), with a capability of minimising signal distortion introduced during the encoding processes. The reverse one...
详细信息
This study presents a novel spatial audio coding (SAC) technique, called analysis by synthesis SAC (AbS-SAC), with a capability of minimising signal distortion introduced during the encoding processes. The reverse one-to-two (R-OTT), a module applied in the MPEG Surround to down-mix two channels as a single channel, is first configured as a closed-loop system. This closed-loop module offers a capability to reduce the quantisation errors of the spatial parameters, leading to an improved quality of the synthesised audio signals. Moreover, a sub-optimal AbS optimisation, based on the closed-loop R-OTT module, is proposed. This algorithm addresses a problem of practicality in implementing an optimal AbS optimisation while it is still capable of improving further the quality of the reconstructed audio signals. In terms of algorithm complexity, the proposed sub-optimal algorithm provides scalability. The results of objective and subjective tests are presented. It is shown that significant improvement of the objective performance, when compared to the conventional open-loop approach, is achieved. On the other hand, subjective test show that the proposed technique achieves higher subjective difference grade scores than the tested advanced audio coding multichannel.
暂无评论