This paper presents a new implementation of the solid state audio player equipped with TwinVQ decoder. TwinVQ is an advanced high quality coding technology, and its basic algorithm is utilized in MPEG-4/audio standard...
详细信息
This paper presents a new implementation of the solid state audio player equipped with TwinVQ decoder. TwinVQ is an advanced high quality coding technology, and its basic algorithm is utilized in MPEG-4/audio standardized by ISO (International Standardization Organization). Using TwinVQ coding, 44.1 kHz sampled audio data can be compressed at a very low bitrate of 40 kbit/s/ch with better sound quality than conventional coding methods as MPEG-1/audio layer-3. Another feature of the player is that it is extremely compact, business-card-sized, 10 milt thick, including barren, as a result of the efficient design. Decoding algorithms are optimized and implemented into 16 bit fired-point DSP, and a vent thin rechargeable lithium polymer battery is contained integral with the player. As a result, the new player has very practical performance with low power consumption. and 6 hours of 44 kHz stereo audio can be continuously played when the battery is fully charged.
In this paper we propose a long-term prediction method for low delay transform domain general audio coders. This Frequency Domain Joint Harmonics Prediction (FDJHP) method operates directly in the Modified Discrete Co...
详细信息
In this paper we propose a long-term prediction method for low delay transform domain general audio coders. This Frequency Domain Joint Harmonics Prediction (FDJHP) method operates directly in the Modified Discrete Cosine Transform (MDCT) domain and can enhance the coding efficiency, even under very low frequency resolutions. We compare this new method with state-of-the-art MDCT based methods by analyzing bitrate savings and by a listening test using test signals with strong harmonic components. The results indicate that it outperforms an existing method, which also directly operates in the frequency domain. Additionally, we show how it can be combined with the existing techniques into an adaptive system, where the different methods can complement each other.
In this work, we develop a new method for jointly optimal quantization of sinusoidal frequencies, amplitudes, and phases and apply the method to sinusoidal audio coding. This is an extension of an earlier work on quan...
详细信息
In this work, we develop a new method for jointly optimal quantization of sinusoidal frequencies, amplitudes, and phases and apply the method to sinusoidal audio coding. This is an extension of an earlier work on quantization of sinusoidal amplitudes and phases to frequencies. The optimization is performed for a set of sinusoids that models a short segment of an audio signal. For a given bit-rate constraint, the optimal quantizers minimize a single-letter weighted distortion measure that accounts for perceptual importance of sinusoids. The quantizers are derived analytically using high-rate theory. The method yields high performance and has a number of practical advantages over conventional sinusoidal quantization methods.
This letter derives fast decomposition for the quadrature mirror filterbanks (QMFs) of the low power spectral band replication (SBR) tools in the MPEG high efficiency advanced audio coding (HE AAC) decoder. In contras...
详细信息
This letter derives fast decomposition for the quadrature mirror filterbanks (QMFs) of the low power spectral band replication (SBR) tools in the MPEG high efficiency advanced audio coding (HE AAC) decoder. In contrast with the standard method where computation-intensive matrix operations are employed in the QMF, the proposed method decomposes the matrix operations into conventional discrete cosine transform of type II and III (DCT-II and DCT-III) and simple permutations for easy implementation. The computational complexity can be also reduced effectively by using fast algorithms for DCT.
We study the application of wavelet packet filterbanks to low bit-rate transparent audio coding, taking the audio coders' delay requirements into account, and propose low-delay coders based on wavelet packet filte...
详细信息
We study the application of wavelet packet filterbanks to low bit-rate transparent audio coding, taking the audio coders' delay requirements into account, and propose low-delay coders based on wavelet packet filterbanks, We first develop a method of comparison between filterbanks for perceptual audio coding by estimating the necessary bit-rate for a transparent compression, We use this comparison method in order to select the best filters for our audio compression scheme, from a large set of orthogonal and biorthogonal wavelets. Different wavelet filters may be used at different stages of the tree-structured decomposition with a constraint on the overall delay taken into account. The optimization is carried out with a simulated annealing procedure, proposing two wavelet packet filterbanks, exhibiting average and low delays. They are inserted in a complete audio coder that employs vector quantization and considers psychoacoustic models. The use of the proposed filterbanks leads to the design of a new bit allocation procedure, taking into account the lack of selectivity of the equivalent synthesis filters in a wavelet packet filterbank, The resulting audio scheme is validated through listening tests. The wavelet packet filterbanks are shown to be a promising tool for audio coding, especially for low-delay coding: with average delay, the quality of the wavelet packet filterbanks is as good as with MPEG-1 Layer-2, both with 80 Kb/s, and when reducing the delay to 200 samples, 96 Kb/s are needed to achieve the same quality..
This article provides a compact overview of the history, technology, and performance of MPEG Surround. The technology of MPEG Surround is based on the spatial audio coding (SAC) principle: In the encoder, a mono- or s...
详细信息
This article provides a compact overview of the history, technology, and performance of MPEG Surround. The technology of MPEG Surround is based on the spatial audio coding (SAC) principle: In the encoder, a mono- or stereophonic down- mix is generated from the multichannel input signal, and additional parametric side information is extracted to guide the subsequent up-mix procedure in the *** Moving Pictures Expert Group (MPEG) Surround, a data-rate efficient coding scheme for high-quality multichannel sound using novel parametric coding techniques has been standardized.
Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio code...
详细信息
Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 audio Layer III codec at 112 kbps.
A generalised perceptual filter which aims to reduce the audible quantisation noise in low bit rate audio coding has been developed. The filter derivation is based on a psychoacoustic excitation pattern model. Experim...
详细信息
A generalised perceptual filter which aims to reduce the audible quantisation noise in low bit rate audio coding has been developed. The filter derivation is based on a psychoacoustic excitation pattern model. Experimental results show that the perceptual filter can reduce the audible quantisation noise in a sub-band coded audio signal.
This letter proposes a new method for audio coding that utilizes blind spectral recovery to improve the coding efficiency without compromising performance. The proposed method transmits only a fraction of the spectral...
详细信息
This letter proposes a new method for audio coding that utilizes blind spectral recovery to improve the coding efficiency without compromising performance. The proposed method transmits only a fraction of the spectral coefficients, thereby reducing the coding bit rate. Then, it recovers the remaining coefficients in the decoder using the transmitted coefficients as input. The proposed method is differentiated from conventional spectral recovery in that the coefficients to be recovered are interleaved with the transmitted coefficients to obtain the most data correlation. Further, it enhances the transmitted coefficients, which are degraded by quantization errors, to deliver better information to the recovery process. The spectral recovery is conducted recursively on a band basis such that information recovered in one band is used for the recovery in subsequent bands. An improved level correction for the recovered coefficients and a new sign coding are also developed. A subjective performance evaluation confirms that the proposed method at 40 kbps provides statistically equivalent sound quality to a state-of-the-art coding method at 48 kbps for speech and music categories.
Current and future visual communications for applications such as broadcasting, videotelephony, video- and audiographic-conferencing, and interactive multimedia services assume a substantial audio component. Even text...
详细信息
Current and future visual communications for applications such as broadcasting, videotelephony, video- and audiographic-conferencing, and interactive multimedia services assume a substantial audio component. Even text, graphics, fax, still images, email documents, etc. will gain from voice annotation and audio clips. A wide range of speech, wideband speech, and wideband audio coders is available for such applications. In the context of audiovisual communications, the quality of telephone-bandwidth speech is acceptable for some videotelephony and videoconferencing services. Higher bandwidths (wideband speech) may be necessary to improve the intelligibility and naturalness of speech. High quality audio coding including multichannel audio will be necessary in advanced digital TV and multimedia services. This paper explains basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications. These signal classes differ in bandwidth, dynamic range, and in listener expectation of offered quality. It will become obvious that the use of our knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago. The paper concentrates on worldwide source coding standards beneficial for consumers, service providers, and manufacturers.
暂无评论