A new algorithm for achieving flexible tiling, of the time axis for audio coding purposes is presented, It is based on the calculus of the distances among a predetermined number of time-frequency pairs, From the compu...
详细信息
A new algorithm for achieving flexible tiling, of the time axis for audio coding purposes is presented, It is based on the calculus of the distances among a predetermined number of time-frequency pairs, From the computed distances. a clustering process determines the final subdivision of each audio frame. Experimental results demonstrates the good performance of the proposed algorithm. which provides high coding, efficiency with a reduced complexity.
Abstract The method of quantization noise control of audio coding in the wavelet domain is proposed. Using the inverse Discrete Fourier Transform (DFT), it converts the masking threshold coming from MPEG psycho-acou...
详细信息
Abstract The method of quantization noise control of audio coding in the wavelet domain is proposed. Using the inverse Discrete Fourier Transform (DFT), it converts the masking threshold coming from MPEG psycho-acoustic model in the frequency domain to the signal in the time domain; the Discrete Wavelet Packet Transform (DWPF) is performed; the energy in each subband is regarded as the maximum allowed quantization noise energy. The experimental result shows that the proposed method can attain the nearly transparent audio quality below 64kbps for the most testing audio signals.
The term "Immersive audio"is frequently used to describe an audio experience that provides to the listener the sensation of being fully immersed or "present"in a sound scene. This can be achieved v...
详细信息
The term "Immersive audio"is frequently used to describe an audio experience that provides to the listener the sensation of being fully immersed or "present"in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above and below listener ear level) and binaural audio to headphones. This article provides an overview of the recent MPEG standard, MPEG-H 3D audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, higher order ambisonics), and is now being adopted in broadcast and streaming applications.
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcod...
详细信息
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcoders (stages) and the corresponding optimal quantizers using high-rate theory. The analytical solutions for optimal quantizers allow a coder to easily adapt to changes in bit-rate requirements. As an illustration of the new method, we consider quantization in a two-stage sinusoidal/wave form coder that is a widely used combination in audio coding. We show that at low total rates most of the rate should be assigned to the sinusoidal (model-based, subspace) subcoder, while at high total rates most of the rate should be assigned to the waveform (full-space) subcoder. We compare the new method to a reference quantization method that does not use rate-distortion optimization. A significantly higher performance of the new method is shown by means of a listening test.
In this paper we present a new model-based method to code the transform coefficients of audio signals. The histogram of transform coefficients is approximated by a generalized Gaussian model for efficient model-based ...
详细信息
ISBN:
(纸本)1424407281
In this paper we present a new model-based method to code the transform coefficients of audio signals. The histogram of transform coefficients is approximated by a generalized Gaussian model for efficient model-based bit allocation and the spectrum is coded by scalar quantization followed by arithmetic coding. An example coder operating at 16 kHz and using predictive modified discrete cosine transform (MDCT) coding is described. We compare the performance of the proposed coder with ITU-T G.722.1. Objective and subjective quality results are presented. The proposed coder is better than ITU-T G.722.1 at 24 kbit/s and equivalent at 32 kbit/s.
To help develop ultra-low power wireless hearing aid products, we investigate the integration of subband audio coding with hearing aid applications. Both the audio coding and the hearing aid application use subband pr...
详细信息
ISBN:
(纸本)9781424414833
To help develop ultra-low power wireless hearing aid products, we investigate the integration of subband audio coding with hearing aid applications. Both the audio coding and the hearing aid application use subband processing, but their requirements for the filterbanks are totally different. The hearing aid application typically uses an oversampled filterbank to reduce the aliasing in each subband, whereas the audio codec needs a critically sampled filterbank for maximal coding efficiency. A joint filterbank structure is proposed in this paper to satisfy these contradictive filterbank requirements. With this structure, the two filterbanks are combined into a single stereo filterbank operation, which can be efficiently implemented. on a filterbank coprocessor. This structure substantially reduces the computation complexity, power consumption and memory usage.
Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material w...
详细信息
ISBN:
(纸本)9781479974504
Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material with inter-channel phase difference (IPD) of about +/-90 degrees. To address this issue a generalization of the lapped transform coding scheme is proposed which retains the perfect reconstruction property while allowing the usage of three further transform kernels, one of which is the modified discrete sine transform (MDST). Blind listening tests indicate that by frame-wise adaptation of each channel's transform kernel to the instantaneous IPD characteristics, notable gains in coding quality are possible with only negligible increase in decoder complexity and parameter rate.
A novel approach to PWM coding is introduced based on generating two complementary PWM streams out of the in-band signal's spectrum. The 2 streams are then recombined in a suitable way, so that out-of-phase cancel...
详细信息
ISBN:
(纸本)9780992862671
A novel approach to PWM coding is introduced based on generating two complementary PWM streams out of the in-band signal's spectrum. The 2 streams are then recombined in a suitable way, so that out-of-phase cancellation of the carrier frequency harmonics is achieved. The approach suppresses the strong out-of-band frequencies of the carrier signal without introducing distortion of the in-band coded signal. Such method can achieve superior reduction compared to the out-of-band artefact suppression induced by traditional analog low-pass filters employed in typical Class-D audio amplifiers or other switching power delivery systems, hence allowing designs with reduced filter requirements or even filterless implementations.
Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). P...
详细信息
ISBN:
(纸本)9781479981311
Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). Potentially, the subjective listening experience may be replicated using a compact spatial format with a set number of dynamic objects and scene-based elements, retaining only the perceptual essence of the audio scene. The compact format would further enable a reduction in the complexity of subsequent compression and rendering. This paper investigates these hypotheses by exploring the use of a compact format that consists of up to four dynamic objects and nine HOA channels, with the Enhanced Voice Services (EVS) codec being applied to a 4-channel down-mix of the compact format.
Switching between speech coding and generic audio coding schemes was recently proven to be very efficient for coding a large range of audio materials at low bit-rates. However, it strongly relies on a robust classific...
详细信息
ISBN:
(纸本)9780992862633
Switching between speech coding and generic audio coding schemes was recently proven to be very efficient for coding a large range of audio materials at low bit-rates. However, it strongly relies on a robust classification of the input signal. The aim of the paper is to design a reliable speech and music discriminator (SMD) for such an application. Main attention was laid on getting a good tradeoff between accuracy, reactivity and stability of the decision while keeping the delay and complexity reasonably low. To this end, short-term and long-term features are dissociated before being conveyed to two different classifiers. The two classifier outputs are combined in a final decision using a hysteresis. Objective measures show that a more reliable switching decision is achievable. The SMD was successfully implemented in MPEG Unified Speech and audio coding (USAC). It allows the codec to show unprecedented audio quality.
暂无评论