Perceptual audio coders use an estimated masked threshold for the determination of the maximum permissible just-inaudible noise level introduced by quantization. This estimate is derived from a psychoacoustic model mi...
详细信息
Perceptual audio coders use an estimated masked threshold for the determination of the maximum permissible just-inaudible noise level introduced by quantization. This estimate is derived from a psychoacoustic model mimicking the properties of masking. Most psychoacoustic models for coding applications use a uniform (equal bandwidth) spectral decomposition as a first step to approximate the frequency selectivity of the human auditory system. However, the equal filter properties of the uniform subbands do not match the nonuniform characteristics of cochlear filters and reduce the precision of psychoacoustic modeling. Even so, uniform filter banks are applied because they are computationally efficient. This paper presents a psychoacoustic model based on an efficient nonuniform cochlear filter bank and a simple masked threshold estimation.. The novel filter-bank structure employs cascaded low-order HR filters and appropriate down-sampling to increase efficiency. The filter responses are. optimized for the modeling of auditory masking effects. Results of the new psychoacoustic model applied to audio coding show better performance in terms of bit rate and/or quality of the new model in comparison with other state-of-the-art models using a uniform spectral decomposition. The low delay of the new model is particularly suitable for low-delay coders.
The Moving Pictures Expert Group (MPEG) within the International Organization of Standardization (ISO) has developed a series of audio-visual standards known as MFEG-1 and MPEG-2. These audio-coding standards are the ...
详细信息
The Moving Pictures Expert Group (MPEG) within the International Organization of Standardization (ISO) has developed a series of audio-visual standards known as MFEG-1 and MPEG-2. These audio-coding standards are the first international standards in the field of high-quality digital audio compression. MPEG-1 covers coding of stereophonic audio signals at high sampling rates aiming at transparent quality, whereas MPEG-2 also offers stereophonic audio coding at lower sampling rates. In addition, MPEG-2 introduces multichannel coding with and without backwards compatibility to MPEG-1 to provide an improved acoustical image for audio-only applications and for enhanced television and video-conferencing systems. MPEG-2 audio coding without backwards compatibility, called IMPEG-2 Advanced audio coding (AAC), offers the highest compression rates. Typical application areas for MPEG-based digital audio are in the fields of audio production, program distribution and exchange, digital sound broadcasting, digital storage, and various multimedia applications. We describe in some detail the key technologies and main features of MPEG-1 and MPEG-2 audio coders. We also present the MPEG-4 standard and discuss some of the typical applications for MPEG audio compression.
Today's standard for digital audio recording in the consumer field is the compact disc (CD), where a stereo signal is sampled with fs = 44.1 kHz and quantized with 16 bits. This yields a net data rate of about 1.4...
详细信息
Today's standard for digital audio recording in the consumer field is the compact disc (CD), where a stereo signal is sampled with fs = 44.1 kHz and quantized with 16 bits. This yields a net data rate of about 1.4 Mbit/sec, that can easily be recorded on an optical disc like the CD or on magnetic tape using the R-DAT signal format.
This paper describes a new audio coding scheme based on adaptive wavelet analysis that provides transparent audio coding for CD-audio signals at low bit rates (approximate to1.4 bits/sample per channel). A new percept...
详细信息
This paper describes a new audio coding scheme based on adaptive wavelet analysis that provides transparent audio coding for CD-audio signals at low bit rates (approximate to1.4 bits/sample per channel). A new perceptual cost function is defined to obtain the best wavelet-packet base for each audio frame. The sharp variations in quantization noise that appear at the border of the frames are minimized by a novel approach that avoids overlapping. The proposed coder guarantees high perceptual quality using filters that generate wavelets of any compact support, because a bit-allocation algorithm that takes into account the equivalent filter frequency responses of the synthesis filter bank branches is used.
In this paper, joint structures for audio coding and echo cancellation are investigated, utilizing standard audio coders. Two types of audio coders are considered, coders based on cosine modulated filterbanks and code...
详细信息
In this paper, joint structures for audio coding and echo cancellation are investigated, utilizing standard audio coders. Two types of audio coders are considered, coders based on cosine modulated filterbanks and coders based on the modified discrete cosine transform (MDCT). For the first coder type, two methods for combining such a coder with a subband echo canceler are proposed. The two methods are: a modified audio coder filterbank that is suitable for echo cancellation but still generates the same final decomposition as the standard audio coder filterbank, and another that converts subband signals between an audio coder filterbank and a filterbank designed for echo cancellation. For the MDCT based audio coder, a joint structure with a frequency-domain adaptive filter based echo canceler is considered. Computational complexity and transmission delay for the different coder/echo canceler combinations are presented. Convergence properties of the proposed echo canceler structures ate shown using simulations with real-life recorded speech.
The International Telegraph and Telephone Consultative Committee (CCITT) has recommended an algorithm to code wideband speech and music (i.e. 7-kHz audio) with 64 kb/s as an international standard (recommendation G.72...
详细信息
The International Telegraph and Telephone Consultative Committee (CCITT) has recommended an algorithm to code wideband speech and music (i.e. 7-kHz audio) with 64 kb/s as an international standard (recommendation G.722). A multimedia multipoint teleconference system using the 7-kHz audio coding standard is discussed. System requirements for teleconferencing are examined, and control procedures for the system are examined. A system using digital leased circuits is presented, and the audio-bridging technique used in this system is discussed.< >
This study proposes a new method of audio coding based on spectral recovery, which can enhance the performance of transform audio coding. An encoder represents spectral information of an input in a time-frequency doma...
详细信息
ISBN:
(纸本)9781479981311
This study proposes a new method of audio coding based on spectral recovery, which can enhance the performance of transform audio coding. An encoder represents spectral information of an input in a time-frequency domain and transmits only a portion of it so that the remaining spectral information can be recovered based on the transmitted information. A decoder recovers the magnitudes of missing spectral information using a convolutional neural network. The signs of missing spectral information are either transmitted or randomly assigned, according to their importance. By combining transmission and recovery of spectral information, the proposed method can enhance the coding performance, compared with conventional transform coding. The subjective performance evaluation shows that, for mono coding at 39.4 kbps, the proposed method provides higher sound quality than the USAC, by an average MUSHRA score of 8.5.
A new audio transform coding technique is proposed that reduces the bitrate requirements of the perceptual transform audio coders by utilizing the stationarity characteristics of the audio signals. The method detects ...
详细信息
A new audio transform coding technique is proposed that reduces the bitrate requirements of the perceptual transform audio coders by utilizing the stationarity characteristics of the audio signals. The method detects the frames that have significant audible content and codes them in a way similar to conventional perceptual transform coders. However, when successive data frames are found to be similar to those sections, then their audible differences only are coded. An error analysis for the proposed method is presented and results from tests on different types of audio material are listed, indicating that an average of 30% in compression gain (over the conventional perceptual audio coders bitrate) can be achieved, with a small deterioration in the audio quality of the coded signal. The proposed method has the advantage of easy adaptation within the perceptual transform coders architecture and add only small computational overhead to these systems.
In, this work, we present a new method for quantization of sinusoidal amplitudes and phases, and apply the method to sinusoldal coding of speech and audio signals. The method is based on unrestricted polar quantizatio...
详细信息
In, this work, we present a new method for quantization of sinusoidal amplitudes and phases, and apply the method to sinusoldal coding of speech and audio signals. The method is based on unrestricted polar quantization, where phase quantization accuracy depends on amplitude. Amplitude and phase quantizers are derived under an entropy (average rate) constraint using high-rate assumptions. First, we derive optimal quantizers for one sinusoid and a mean-squared error distortion measure. We provide a detailed analysis of entropy-constrained unrestricted polar quantization, showing its high performance and practicality even at low rates. Second, we find optimal quantizers for a set of sinusoids that model a short segment of an audio signal. The optimization is performed using a, weighted error measure that can account for the masking effect in the human auditory system. We find the optimal rate distribution between, sinusoids, as well as the corresponding optimal amplitude and phase quantizers, based on the perceptual importance of sinusolds defined by masking. The new method is used in an audio-coding application and is shown to significantly outperform a conventional sinsoidal quantization method where phase quantization accuracy is identical for all sinusoids.
This paper proposes an encoding method for high-quality, low-delay audio communication that is robust to losses in packetized transmission. Robustness is provided by a multiple description vector quantization (MDVQ) t...
详细信息
This paper proposes an encoding method for high-quality, low-delay audio communication that is robust to losses in packetized transmission. Robustness is provided by a multiple description vector quantization (MDVQ) technique that is designed to minimize the mean-squared error (MSE). The key to applying this technique effectively is the use of psycho-acoustically controlled pre- and post-filters that make the mean-squared quantization error perceptually relevant. Experiments show that the MDVQ-based encoder yields better results-in both MSE and subjective audio quality-than simple alternative coders with the same low delay.
暂无评论