We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated ...
详细信息
ISBN:
(纸本)0780364163
We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.
We describe ERB-MDCT, an invertible real-valued time frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform...
详细信息
ISBN:
(纸本)9780992862633
We describe ERB-MDCT, an invertible real-valued time frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform with a resolution evolving across frequency to match the perceptual ERB frequency scale, while the frequency scale in most invertible transforms (e.g. MDCT) is uniform. ERB-MDCT has mostly the same frequency scale as ERBLet, but the main improvement is that atoms are quasi-orthogonal, i.e. its redundancy is close to 1. Furthermore, the energy is more sparse in the time-frequency plane. Thus, it is more suitable for audio coding than ERBLet.
Modern networks are highly variable and, as a result, source coders are commonly used under conditions that they were not designed for. We address this problem with a source-coding philosophy that aims at the instanta...
详细信息
ISBN:
(纸本)9781424442966
Modern networks are highly variable and, as a result, source coders are commonly used under conditions that they were not designed for. We address this problem with a source-coding philosophy that aims at the instantaneous re-optimization of a source coder to match a wide range of constraints on rate or quality and a wide range of packet-loss rates. We present a number of technologies that can be reconfigured by solving analytic relations that use the current conditions and a statistical description of the source as input. The technologies include distribution-preserving quantizers, flexible multiple-description quantizers, and a rate distribution scheme. Based on the generic technologies, we created a complete audio coder. Formal listening tests show that the resulting audio coding scheme with full flexibility provides a quality that is on-par with the best standardized codecs for any particular rate.
This paper presents a multi-channel audio coding method compatible with the standard stereo audio coding method. The encoding method consists of three parts: spatial-temporal analysis of sound source location, compres...
详细信息
ISBN:
(纸本)9781424422401
This paper presents a multi-channel audio coding method compatible with the standard stereo audio coding method. The encoding method consists of three parts: spatial-temporal analysis of sound source location, compressing multi-channel audio into stereo audio and encoding stereo signals. The proposed method is simpler and more effective than conventional methods in terms of audio data size with small additional information. With proposed method, it is possible to transmit or store multi-channel audio signals as stereo audio signals.
In this paper, an efficient algorithm for implementing MDCT/IMDCT of lengths N = 5.2(m) (m >= 2) is presented. Transforms for such lengths are of interest for speech and audio coding applications, such as recently ...
详细信息
ISBN:
(纸本)9781424414833
In this paper, an efficient algorithm for implementing MDCT/IMDCT of lengths N = 5.2(m) (m >= 2) is presented. Transforms for such lengths are of interest for speech and audio coding applications, such as recently issued and/or emerging standards G.729.1, ***-VBR, and EVRC-WB. In our design we utilize a mapping of MDCT of size N into N/2-point DCT-IV and DCT-II with isolated pre-multiplications, which are subsequently moved in the windowing stage. We show that such a modified window is piece-wise symmetric, and can be stored using N/2 words. In our algorithm we also use an efficient factorization of 5-point DCT-II which requires only 4 multiplications by irrational factors. We compare our proposed algorithm with several alternative implementations and show that our design offers practically appreciable reduction in complexity and memory usage.
This paper presents a lossless audio coding using Burrows-Wheeler Transform (BWT) and a combination of a Move-To-Front coding (MTF) and Run Length Encoding (RLE). audio signals used are assumed to be of floating point...
详细信息
ISBN:
(纸本)9781479965946
This paper presents a lossless audio coding using Burrows-Wheeler Transform (BWT) and a combination of a Move-To-Front coding (MTF) and Run Length Encoding (RLE). audio signals used are assumed to be of floating point values. The BWT is applied to this floating point values to get the transformed coefficients;and then these resulting coefficients are converted using the Move-to-Front coding to coefficients can be better compressed and then these resulting coefficients are compressed using a combination of the Run Length Encoding, and entropy coding. Two entropy coding are used which are Arithmetic and Huffman coding. Simulation results show that the proposed lossless audio coding method outperforms other lossless audio coding methods;using only Burrows-Wheeler Transform method, using combined Burrows-Wheeler Transform and Move-to-Front coding method, and using combined Burrows-Wheeler Transform and Run Length Encoding method.
A block-based Gaussian mixture model (GMM) is used to model the distribution of transform audio data to be encoded using lattice-based spherical vector quantization (LSVQ). The expectation-maximization algorithm is us...
详细信息
ISBN:
(纸本)9781424442966
A block-based Gaussian mixture model (GMM) is used to model the distribution of transform audio data to be encoded using lattice-based spherical vector quantization (LSVQ). The expectation-maximization algorithm is used to design the GMM to model the marginal density of the transform coefficients and the vector energy density. A GMM-based rate-distortion function is derived and shown to closely match the observed spherical VQ performance. The LSVQ transform audio coding performance is characterized for the best lattices known in 4, 8, 16, and 32 dimensions.
The Time-Warped Modified Discrete Cosine Transform (TW-MDCT) improves the energy compaction for harmonic signals with varying fundamental frequency compared to the plain MDCT. Adaptive context based entropy coding has...
详细信息
ISBN:
(纸本)9781479903566
The Time-Warped Modified Discrete Cosine Transform (TW-MDCT) improves the energy compaction for harmonic signals with varying fundamental frequency compared to the plain MDCT. Adaptive context based entropy coding has the potential to provide higher gain over memoryless entropy coding. But in combination with the TW-MDCT, the context based adaptive coding may lead to suboptimal coding. This paper presents an algorithm for improving the context for the TW-MDCT. This is mainly achieved by exploiting already available information on the frequency variation needed by the TW-MDCT. This results in an improved entropy coding.
This paper proposes a new quantitative method called the transform coefficients, which is based on the algebraic quantization method. This coefficient is defined as a unique amplitude multiplied by a number of *** ord...
详细信息
ISBN:
(纸本)9783037853771
This paper proposes a new quantitative method called the transform coefficients, which is based on the algebraic quantization method. This coefficient is defined as a unique amplitude multiplied by a number of *** order to determine the signs of pulse, location and amplitude, we selected the transmission coefficient by adopting an ideal standard errors. This simple quantification has been implemented in scalable coding of microwave transform, and has been proved that its voice signal and music effect is better than that of the SPIHT.
audio coding schemes such as MP3 or MPEG AAC can use a variable number of bits to encode each signal frame. However they can still be used oil constant bit rate channels thanks to a variation buffer limited in size. T...
详细信息
ISBN:
(纸本)0780391543
audio coding schemes such as MP3 or MPEG AAC can use a variable number of bits to encode each signal frame. However they can still be used oil constant bit rate channels thanks to a variation buffer limited in size. This buffer allows the encoder to use more bits for high complexity frames in order to maintain quality, the buffer being decreased e.g. for silent frames, leading to a constant rate coding on average. The use of such a buffer aims at reducing quality fluctuations over time, thanks to reasonable instantaneous bit rate variations. In this paper, a new design for a buffer controller is presented. This controller aims at finding the appropriate number of bits to encode each audio frame, according to chosen perceptual criteria and reduces tile perceived distortion over time. This process can take place in a two pass encoding process, the First pass aiming at measuring frames complexity and bit demand, the second consisting in allocating the appropriate number of bits for each. In order to cope with online coding application a single pass encoding with a reasonable additional delay is also investigated.
暂无评论