The ACELP method makes use of multipulse structure to represent the excitation pulses of residual signal. With the purpose of computational complexity reduction, this paper provides the maximum-take-precedence ACELP (...
详细信息
The ACELP method makes use of multipulse structure to represent the excitation pulses of residual signal. With the purpose of computational complexity reduction, this paper provides the maximum-take-precedence ACELP (MTP-ACELP) search method under the acceptable degradation in performance. Because the maximum of target signal is preferentially compensated, the degradation of performance would be diminished. By predicting the locations of pulses, the computational complexity would be reduced. We not only reduce the possible pulse combinations in the search procedure but also avoid the computation of useless correlation functions before the search procedure. Furthermore, the proposed method is compatible to any ACELP type vocoder, e.g. the G.723.1, G.729, GSM- EFR standards.
Describes a robust feature extraction method for continuous speech recognition. Central to the method is the minimum variance distortionless response (MVDR) method of spectrum estimation and a feature trajectory smoot...
详细信息
Describes a robust feature extraction method for continuous speech recognition. Central to the method is the minimum variance distortionless response (MVDR) method of spectrum estimation and a feature trajectory smoothing technique for reducing the variance in the feature vectors. The above method, when evaluated on continuous speech recognition tasks in a stationary and moving car, gave an average relative improvement in WER of greater than 30%.
Gauss mixtures are a popular class of models in statistics and statistical signal processing because they can provide good fits to smooth densities, because they have a rich theory, and because they can be well estima...
详细信息
Gauss mixtures are a popular class of models in statistics and statistical signal processing because they can provide good fits to smooth densities, because they have a rich theory, and because they can be well estimated by existing algorithms such as the EM (expectation maximization) algorithm. We here extend an information theoretic extremal property for source coding from Gaussian sources to Gauss mixtures using high rate quantization theory and extend a method originally used for LPC (linear predictive coding) speech vector quantization to provide a Lloyd clustering approach to the design of Gauss mixture models. The theory provides formulas relating minimum discrimination information (MDI) for model selection and the mean squared error resulting when the MDI criterion is used in an optimized robust classified vector quantizer. It also provides motivation for the use of Gauss mixture models for robust compression systems for general random vectors.
This paper presents a new 1200 bps speech coder designed with a tree searched multi stage matrix quantization scheme. With the new matrix quantization method, spectral distortion about 1 dB is achieved using rates as ...
详细信息
ISBN:
(纸本)0780366859
This paper presents a new 1200 bps speech coder designed with a tree searched multi stage matrix quantization scheme. With the new matrix quantization method, spectral distortion about 1 dB is achieved using rates as low as 18 bits/frame. In the proposed coder, LSF parameters of two consecutive frames are grouped into a superframe and jointly quantized. For other speech parameters, quantization is made for each frame. New techniques for improving performance include joint quantization of pitch and voiced/unvoiced/mixed decisions, gain interpolation and residual LSF quantization. For the new matrix quantization based speech coder (MQBC), the listening tests have proven that an efficient and high quality coding has been achieved at bit rate 1200 bps. Test results are compared with the 2400 bps LPC10e coder and the new 2400 bps MELP coder chosen as the new 2400 bps Federal Standard.
In this paper, we propose a formant weighted cepstral feature for an LSP-based speech recognition system. The proposed weighting scheme is based on the well-known property of LSPs that the speech spectrum has a peak w...
详细信息
In this paper, we propose a formant weighted cepstral feature for an LSP-based speech recognition system. The proposed weighting scheme is based on the well-known property of LSPs that the speech spectrum has a peak when adjacent LSFs come close. By applying this scheme to the pseudo-cepstrum (PCEP) conversion process (Kim et al. 1993), we can obtain formant weighted or peak enhanced cepstral features. Results of speech recognition experiments using QCELP coder output show that the proposed feature set outperforms the conventional features such as LSP or PCEP. Moreover its performance also exceeds that of the unquantized LPC cepstrum.
This paper explores blind deconvolution of reverberated speech signals in microphone array applications. Two regularization approaches are proposed based on available a priori knowledge. The regularized least-squares ...
详细信息
This paper explores blind deconvolution of reverberated speech signals in microphone array applications. Two regularization approaches are proposed based on available a priori knowledge. The regularized least-squares (LS) approach uses the speech signal characteristics and the lowpass nature of the reverberation channel; and the regularized cross correlation (CR) approach requires more precise knowledge of reverberation which can be obtained through training. The two methods are robust to the presence of noise.
Previous work in wireless speech recognition has focused on two methods, namely, quantizing recognition features (e.g. MFCC) or performing recognition using speech coding parameters (e.g. LPC). All of this previous re...
详细信息
Previous work in wireless speech recognition has focused on two methods, namely, quantizing recognition features (e.g. MFCC) or performing recognition using speech coding parameters (e.g. LPC). All of this previous research assumes that the communication channel is only large enough to transmit either speech coding parameters or speech recognition parameters. By contrast, we propose that the speech recognition parameters can be quantized at a rate sufficiently low to allow transmission of both speech coding and speech recognition parameters over a standard cellular channel. In particular, the paper shows that the perceptual LPC (PLP) coefficients can be transmitted at 400 bps with an insignificant loss of digit recognition accuracy.
This paper describes how the results of speaker verification systems can be improved and made robust with the use of a committee of neural networks for pattern recognition rather than the conventional single-network d...
详细信息
ISBN:
(纸本)0780370449
This paper describes how the results of speaker verification systems can be improved and made robust with the use of a committee of neural networks for pattern recognition rather than the conventional single-network decision system. It illustrates the use of a supervised learning vector quantization neural network as the pattern classifier. linear predictive coding and cepstral signal processing techniques are utilized to form hybrid feature parameter vectors to combat the effect of decreased recognition success with increased group size (number of speakers to be recognized).
The technique of code excited linear prediction (CELP) has led to the development of voice coding systems that provide toll quality speech at very low bitrates. While speech and singing share many similarities in term...
详细信息
The technique of code excited linear prediction (CELP) has led to the development of voice coding systems that provide toll quality speech at very low bitrates. While speech and singing share many similarities in terms of production, standard speech coding implementations fall far short when transmitting the singing voice. This paper explores the reasons for this discrepancy and suggests new variations on CELP speech coders that specifically enhance the quality of encoded singing for individual singers. These modifications could be used in a low-bitrate singing voice codec which, in conjunction with multi-track structured coding schemes such as MPEG-4 structured audio, could provide a highly compressed yet high-quality representation of a complex audio scene.
暂无评论