In this paper we present a wideband (44.1 kHz sampling rate) audio and speech coder that combines two different strategies, namely, parametric and waveform coding. It is shown how this approach can be used to design a...
详细信息
In this paper we present a wideband (44.1 kHz sampling rate) audio and speech coder that combines two different strategies, namely, parametric and waveform coding. It is shown how this approach can be used to design a layered bit stream scalable coder offering a wide variety of decoding bit rates with little scalability loss. Moreover, the bit rates associated with the different layers are competitive, in terms of quality, to those of standardized coders (MP3, AAC) tuned at a particular bit rate.
A combined speech and audio coder is proposed. The coder structure resembles a low-delay CELP coder, however, the excitation gain is adapted non-linearly in a sample-by-sample fashion by using a trained neural network...
详细信息
ISBN:
(纸本)0780340736
A combined speech and audio coder is proposed. The coder structure resembles a low-delay CELP coder, however, the excitation gain is adapted non-linearly in a sample-by-sample fashion by using a trained neural network, and the spectral parameters are derived from backward non-linear prediction based on a second-order Volterra filter. A perceptual weighting filter derived from psychoacoustic analysis in the spectral domain is used to shape the coding noise. The proposed non-linear adaptation schemes significantly improve the effectiveness of using an analysis-by-synthesis model for codingaudio signal. Simulation results show that transparent coding of wideband (7 kHz) speech and audio at 24 kbps is achieved.
Wavelet packet decompositions based on tree structured 2-channel filter banks with conjugate quadrature filters (CQF) have found many applications in the area of audio coding. Their time-frequency tiling is the dual o...
详细信息
Wavelet packet decompositions based on tree structured 2-channel filter banks with conjugate quadrature filters (CQF) have found many applications in the area of audio coding. Their time-frequency tiling is the dual of the time-varying modulated lapped transforms (MLT). We present a new orthonormal wavelet packet basis, which is constructed by the frequency-varying MLT. These can be viewed as the direct analogy to time-varying transforms. In contrast to the classical decomposition, the new transform shows a good bandpass behaviour without strong spectral sidelobes. Hence, it is particularly useful for the coding of audio signals. For the signals examined the coding gain is higher than that obtained with cascaded CQF filter banks. A fast algorithm for the new wavelet packet transform is possible with a polyphase implementation for the required modulated lapped transforms. This algorithm exceeds a fast CQF based wavelet packet decomposition, especially with the high frequency resolution, which is necessary in audio coding.
This paper presents an audio player using TwinVQ, an advanced high quality coding technology, which can compress 44.1 kHz sampled audio data at a very low bit rate of 40 kbit/s/ch. The player is business-card-sized of...
详细信息
This paper presents an audio player using TwinVQ, an advanced high quality coding technology, which can compress 44.1 kHz sampled audio data at a very low bit rate of 40 kbit/s/ch. The player is business-card-sized of 10 mm thickness, and has practical performance with low power consumption.
Speech and audio coding have during the last decade converged to an increasingly unified technology. This contribution discusses one of the remaining fundamental differences between speech and audio paradigms, namely,...
详细信息
Speech and audio coding have during the last decade converged to an increasingly unified technology. This contribution discusses one of the remaining fundamental differences between speech and audio paradigms, namely, windowing of the input signal. audio codecs generally use lapped transforms and apply a perceptual model in the transform domain, whereby temporal continuity is achieved by windowing and overlap-add. Speech codecs on the other hand achieve temporal continuity by using linear predictive filtering, whereby windowing is applied in the residual domain. Despite these fundamental differences, we demonstrate that the two windowing approaches, combined with perceptual modeling, perform very similarly both in terms of perceptual quality and theoretical properties.
Low delay perceptual audio coding has recently gained wide acceptance for high quality communication. While common schemes are based on the well-known Modified Discrete Cosine Transform (MDCT) filterbank, this paper d...
详细信息
Low delay perceptual audio coding has recently gained wide acceptance for high quality communication. While common schemes are based on the well-known Modified Discrete Cosine Transform (MDCT) filterbank, this paper describes novel coding algorithms that, for the first time, make use of dedicated low delay filterbanks, thus achieving improved coding efficiency while maintaining or even reducing the low codec delay. The MPEG-4 Enhanced Low Delay AAC (AAC-ELD) coder currently under development within ISO/MPEG combines a traditional perceptual audio coding scheme with spectral band replication (SBR), both running in a delay-optimized fashion by using low delay filterbanks.
Systems employing multichannel audio are becoming increasingly common. Multichannel audio is used in domestic surround sound systems, multitrack audio recording and new applications are emerging. Digital multichannel ...
详细信息
Systems employing multichannel audio are becoming increasingly common. Multichannel audio is used in domestic surround sound systems, multitrack audio recording and new applications are emerging. Digital multichannel audio systems often employ non-standard formats that complicate the exchange of audio data between different systems. Typical multi-track recording environments consist of a large number of channels. Each channel has its own acoustic configuration with respect to equalisation, levels and effects. Another characteristic of multitrack recording is that only a small subset of channels are non-silent at any one time. A family of lossless, compact and general multichannel audio formats are proposed that support a large number of audio channels and the embedding of non-audio information. Non-silent tracks are combined using Reed Solomon codes. The strategy can therefore be implemented efficiently using one of the many existing Reed Solomon encoder/decoder hardware designs, and several configurations are illustrated.
In this paper we propose a new approach based on energy-adaptive matching pursuits to improve sinusoidal modelling for parametric speech and audio coding. To reduce the complexity of the algorithm, an over-complete di...
详细信息
In this paper we propose a new approach based on energy-adaptive matching pursuits to improve sinusoidal modelling for parametric speech and audio coding. To reduce the complexity of the algorithm, an over-complete dictionary composed of complex exponentials is used. An analysis-synthesis windows scheme that avoids overlapping is proposed. For efficient quantization of sinusoidal model parameters, a new algorithm that largely reduces the side information required by the decoder is described. Experimental results show evidence of the advantages of the proposed method to be integrated into multi-pans models for parametric speech/audio coding.
Search algorithms for selecting signal decompositions based on the minimization of a cost functional have been proposed in the literature, in addition to several additive and non-additive information costs. We introdu...
详细信息
Search algorithms for selecting signal decompositions based on the minimization of a cost functional have been proposed in the literature, in addition to several additive and non-additive information costs. We introduce a new cost function and a search algorithm built from a perceptual criterion. Their efficiency is demonstrated with results showing higher quality compressed audio signals than preliminary approaches at similar bit-rates.
暂无评论