The modified discrete cosine transform (MDCT) is employed in subband/transform coding schemes as the analysis/synthesis filter bank based on time domain aliasing cancellation (TDAC). The most efficient implementation ...
详细信息
The modified discrete cosine transform (MDCT) is employed in subband/transform coding schemes as the analysis/synthesis filter bank based on time domain aliasing cancellation (TDAC). The most efficient implementation of the forward and inverse MDCT computation for layer III in MPEG-1 and MPEC-2 international audio coding standards is proposed. It is based on a new fast algorithm for the forward and inverse MDCT computation in the oddly stacked system. The complete signal flow graphs for the implementation of MDCT and inverse MDCT in layer III are also provided.
Multimedia transmission over Internet is getting popular and increasingly important;In particular, scalable coding is desirable for heterogeneous network with varies bandwidths. In this work, we propose a scalable emb...
详细信息
ISBN:
(纸本)0780367200
Multimedia transmission over Internet is getting popular and increasingly important;In particular, scalable coding is desirable for heterogeneous network with varies bandwidths. In this work, we propose a scalable embedded zero tree wavelet packet (Scalable EZWP) audio coding system that is a scalable audio compression system using wavelet packet decomposition and embedded zero-tree coding. We focus on multi-layer low bitrate coding which delivers high perceptual quality. In the base layer, the overlapped audio segment is first transformed by wavelet packet. Then the local significant coefficients are extracted, quantized, and coded by variable length coding. In the enhancement layer and the full band layer, the residual signal that is the difference between the original and the output of the previous layer is coded via EZW with psychoacoustic model and arithmetic coding. The target bit rates for three layers are 16, 32, and 64 1Kbps, respectively. The performance of the proposed coding system is only slightly inferior to MPEG-1 layer 3 at 64 Kbps while it provides bitrate scalability that is suitable for multimedia distribution over Internet heterogenous networks.
The concept of "modulation frequency" is shown to be a valuable insight into time-frequency transforms for audio coding. A two-dimensional transform, where the second dimension approximately decomposes the a...
详细信息
ISBN:
(纸本)0819441880
The concept of "modulation frequency" is shown to be a valuable insight into time-frequency transforms for audio coding. A two-dimensional transform, where the second dimension approximately decomposes the audio signal into modulation frequencies, is proposed. This transform, when applied to audio coding, provides high quality at low data rates and adapt gracefully to changes in available bandwidth. It is inherently scalable, meaning that channel conditions can be matched without the need for additional computation. Moreover, it is compact: in subjective tests our algorithm, coded at 32kilobits/seconds/channel, outperformed MPEG-l Layer 3 (MP3) coded at 56 kilobits/seconds/channel (both at 44.1 kHz). This potentially useful result motivates the need for further insight into the definition and analysis of modulation frequency. We thus define modulation frequency for a simple narrowband signal, propose a general bilinear framework for detection, and then propose a minimal set of conditions to extend this definition to broadband signals such as audio.
Binaural Cue coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and side information. The companion paper (Part I) covers the psychoacoustic fundamentals of this method a...
详细信息
Binaural Cue coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and side information. The companion paper (Part I) covers the psychoacoustic fundamentals of this method and outlines principles for the design of BCC schemes. The BCC analysis and synthesis methods of Part I are motivated and presented in the framework of stereophonic audio coding. This paper, Part II, generalizes the basic BCC schemes presented in Part I. It includes BCC for multichannel signals and employs an enhanced set of perceptual spatial cues for BCC synthesis. A scheme for multichannel audio coding is presented. Moreover, a modified scheme is derived that allows flexible rendering of the spatial image at the receiver supporting dynamic control. All aspects of complete BCC encoder and decoder implementations are discussed, such as down-mixing of the input signals, low complexity estimation of the spatial cues, and quantization and coding of the-side information. Application examples are given and the performance of the coder implementations are evaluated and discussed based on subjective listening test results.
This paper describes a new audio coding scheme based on sinusoidal coding of signals. Sinusoidal coding permits the representation of a given signal through the summation of sinusoids. The parameters of the sinusoids ...
详细信息
ISBN:
(纸本)0780366859
This paper describes a new audio coding scheme based on sinusoidal coding of signals. Sinusoidal coding permits the representation of a given signal through the summation of sinusoids. The parameters of the sinusoids (the amplitudes, phases and frequencies) are transmitted to allow the signal reconstruction. In the proposed scheme, the sinusoidal parameters are sorted according to energy content and perceptual significance. The most significant parameters are transmitted first allowing the use of only a small set of the parameters for signal reconstruction. The proposed scheme incurs a low delay and uses a 20 ms frame length. Results show that the coder operating at a mean rate of 39 kb/s, performs favorably in comparison with the MPEG-4 coder at 42 kb/s.
We propose a method that hierarchically quantizes wideband Modified Discrete Cosine Transform (MDCT) coefficients by developing a module that has a transform coding method primarily for audio as the basic structural u...
详细信息
We propose a method that hierarchically quantizes wideband Modified Discrete Cosine Transform (MDCT) coefficients by developing a module that has a transform coding method primarily for audio as the basic structural unit and freely using this module multiple times at the desired frequencies. The major feature of this method is to implement a simple structure having a high degree of freedom in scalable coding to hierarchically quantize MDCT coefficients over a wide band of frequencies by sharing the proposed module and using it multiple times. This paper presents examples using combinations of the module operating at a sampling frequency of 48 kHz and a bit rate of at least 8 kbit/s. In this example, a bit rate of at least 8 kbit/s and a reconstructed frequency band of at least 4 kHz can de selected as the objective. Subjective evaluation tests are performed to verify the effectiveness oft he proposed method. (C) 2001 Scripta Technica
Presented is a new coding paradigm, Multimode Transform Predictive coding (MTPC), which combines speech and audio coding principles in a single coding structure. The paradigm is an adaptive coding paradigm which autom...
详细信息
Presented is a new coding paradigm, Multimode Transform Predictive coding (MTPC), which combines speech and audio coding principles in a single coding structure. The paradigm is an adaptive coding paradigm which automatically adjusts how different coding modules are used based on the input signal. This allows MTPC coders to robustly handle a wider range of signals than single configuration (mode) Transform Predictive coding (TPC) designs. A wideband MTPC coder design targeting two-way communication applications and bitrates from 13 to 40 kbit/s is also presented. Subjective Absolute Category Rating test results on speech, speech in noise and music demonstrate that the performance at 16, 24 and 32 kbit/s meets or exceeds that of ITU-T Rec. G.722 at 48, 56 and 64 kbit/s respectively for many coding conditions. Subjective Reference-ABx (R-ABx) tests are also included to show the potential advantages of the multimode coder over a single mode TPC coder. Finally, possible improvements in the MTPC coder design,for applications such, as broadcasting, which are less sensitive to delay and encoder complexity, are discussed.
Binaural Cue coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multicha...
详细信息
Binaural Cue coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application,of BCC is multichannel audio data rate reduction since only a single down.-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signal's based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.
We present a flexible audio coding system for use in the unbroken transmission of high-quality (192 kHz sampling rate, 24-bit digitization (max.)) stereo-audio data streams, either live or recorded. The system provide...
详细信息
We present a flexible audio coding system for use in the unbroken transmission of high-quality (192 kHz sampling rate, 24-bit digitization (max.)) stereo-audio data streams, either live or recorded. The system provides both lossless and variable-level lossy quality; quality is selectable to suit a full range of wide- and narrow-band IP networks. As input signals for transmission at the server PC, less than nine PCM sound files in different formals are simultaneously encodable, and the efficiency of simultaneous compression is very high. The system is realized by software that runs on a typical PC. This makes everyone able to transmit high-quality sound anywhere within the IP networks. An MPEG-4 audio codec (TwinVQ or AAC) provides the core of the lossy-coding module.
暂无评论