While previous MPEG Audio standards mainly were focused on the representation of audio signals close to or equal to CD quality, the new MPEG-4 Audio standard extends the range of applicability towards significantly lo...
详细信息
While previous MPEG Audio standards mainly were focused on the representation of audio signals close to or equal to CD quality, the new MPEG-4 Audio standard extends the range of applicability towards significantly lower bit rates. Furthermore it offers extended functionalities for the representation of natural and even synthetic audio signals in an object oriented fashion. This paper gives a brief overview on the complete audio part of the MPEG-4 standard and more detailed information on its parts related to speech coding.
Voice is the preferred method of human communication. Although there have been times when it seemed that the voice communications problem was solved, such as when the PSTN was our primary network or later when digital...
详细信息
Over the past several years there has been considerable attention focused on coding and enhancement of speech signals. This interest is progressed towards the development of new techniques capable of producing good qu...
详细信息
Over the past several years there has been considerable attention focused on coding and enhancement of speech signals. This interest is progressed towards the development of new techniques capable of producing good quality speech at the output. speech coding is a process of converting human speech into efficient encoded representations that can be decoded to produce a close approximation of the original signal. This paper deals with the problem of speech coding. It proposes novel approach called Best Tree Encoding (BTE) to encode the wavelet packet Best Tree Structure into a vector of four elements. This research is introducing BTE for solving another problem for speech compression and syntheses. Tree node data coefficients are encoded using LPC Filters and trigonometric features. The encoded vector consists of 4 elements from BTE analysis as well as LPC and trigonometric vector for each leaf node. The quality of the reproduced speech is evaluated for both understanding and quality. The quality of speech signal is measured on the basis of signal to noise ratio, log likelihood ratio, and spectral distortion.
With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed re...
详细信息
With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). This paper presents a new approach called vector quantization. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with 15 to 20 fewer bits/frame than that required for the optimized scalar quantizing approaches presently in use. The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies. This paper introduces the theory in a nonrigorous form, along with practical results to date and an extensive list of research topics for this new area of speech coding.
speech coding is currently an active topic for research in the areas of Very Large-Scale Integrated (VLSI) circuit technology and Digital Signal Processing (DSP). Various techniques are being developed to transmit hig...
详细信息
speech coding is currently an active topic for research in the areas of Very Large-Scale Integrated (VLSI) circuit technology and Digital Signal Processing (DSP). Various techniques are being developed to transmit high quality speech at a low bit rate. The 1/f nature of the speech residual in RELP allows the wavelet transform to efficiently code the residual for transmission. The received speech's bandwidth can be doubled by transmitting only one extra variance value per analysis frame. A prototype 1/f speech coder that employs these techniques has been developed at Drexel University (PA). Various technical issues are being resolved. Also, work is being done to adapt these techniques to music coding as well.
Inexpensive VLSI and an increasing demand for bandwidth efficiency have led to a large increase of the number of applications for speech coding. In the past 5 years, many standards have been defined for use in network...
详细信息
ISBN:
(纸本)1864352094
Inexpensive VLSI and an increasing demand for bandwidth efficiency have led to a large increase of the number of applications for speech coding. In the past 5 years, many standards have been defined for use in network and cellular communications systems. This widespread deployment of speech coding technology has created new challenges such as coding of speech with background noise, robust performance over noisy or fading channel, and performance for multiple encodings. In addition, many applications impose constraints on communication delay, cost and power consumption of the implementation. This paper will first review the underlying speech coding principles used in most international and regional standards. This is followed by a closer look at the trade-offs that tan be made during the design of a speech coder. The last part of the paper will focus on the technical challenges and new directions of speech coding research for the next five years.
Design algorithms and simulation results are presented for vector quantizers for Fourier transformed data. Transforming the data prior to quantization has two potential advantages. First, each sample in the transform ...
详细信息
Design algorithms and simulation results are presented for vector quantizers for Fourier transformed data. Transforming the data prior to quantization has two potential advantages. First, each sample in the transform domain depends on many samples in the original domain. Thus, even scalar quantization in the transform domain is a form of vector quantization or block source coding in the original waveform domain and the basic coding theorems of information theory show that such block codes can provide better performance than scalar codes, even for memoryless sources. Second, vector quantization of Fourier transformed speech waveforms provides distinctly better subjective quality than ordinary vector quantization of the waveform using codes of comparable complexity. While the system is, of course, more complicated due to the need to take Fourier transforms, its envisioned application is as a coder for the output of FFT chips currently available or under development. The proposed implementation of a Fourier transform vector quantizer (FTVQ) uses a product code structure, providing different codes for different coefficient vectors corresponding to different frequency bands. This is a form of subband coding and yields a simple means of optimizing bit allocations among the subcodes. Two coding structures with corresponding distortion measures are considered: those that quantize vectors of pairs of real and imaginary coefficients and those that quantize separate vectors of magnitude and phase coefficients. Both structures yield good performance for the given complexity in comparison to waveform vector quantizers. For speech coding, a magnitude-phase FTVQ yields better subjective quality than a real-imaginary FTVQ when the rate allocation is properly chosen.
作者:
Ma, NNishi, TWei, GKyushu Univ
Dept Comp Sci & Commun Engn Fukuoka 8128581 Japan SCUT
Dept Elect & Commun Engn Guang Zhou 510641 Peoples R China
To improve speech coding qualify, in particular, the long-term dependency prediction characteristics, we propose a new nonlinear predictor, i.e., a fully connected recurrent neural network (FCRNN) where the hidden uni...
详细信息
To improve speech coding qualify, in particular, the long-term dependency prediction characteristics, we propose a new nonlinear predictor, i.e., a fully connected recurrent neural network (FCRNN) where the hidden units have Feedbacks not only from themselves but also from the output unit. The comparison of the capabilities of the FCRNN with conventional predictors shows that the former has Less prediction error than the latter. We apply this FCRNN instead of the previously Fro posed recurrent neural networks in the code-excited predictive speech coding system (i.e., CELP) and shows that our system (FCRNN) requires less bit rate/frame and improves the performance for speech coding.
The authors introduce sequential optimization of the parameters of a 4- and 8-kb/s codebook excited linear predictive (CELP) coder. The short-term filter, the long-term adaptive codebook, and the excitation codebook p...
详细信息
ISBN:
(纸本)0879426977
The authors introduce sequential optimization of the parameters of a 4- and 8-kb/s codebook excited linear predictive (CELP) coder. The short-term filter, the long-term adaptive codebook, and the excitation codebook parameters were sequentially optimized to minimize the resulting weighted mean square error. The sequential optimization procedure considers at each subframe the most probable choice of excitation parameters for the next subframe. A simulated annealing algorithm was used to optimize the short-term filter to minimize weighted mean-squared error. The closed-loop algorithm is iterative in nature, and iterations are conducted over the adaptive codebook, the codebook excitation and the short-term filter in an attempt to jointly optimize the corresponding parameters.
We show how model based prediction can be employed in the construction of a speech codec which operates entirely in the frequency domain of a Modified Discrete Cosine Transform (MDCT). The codec tools described in thi...
详细信息
ISBN:
(纸本)9781538616321
We show how model based prediction can be employed in the construction of a speech codec which operates entirely in the frequency domain of a Modified Discrete Cosine Transform (MDCT). The codec tools described in this paper are part of the Dolby AC-4 system standardized by ETSI and included in ATSC 3.0.
暂无评论