作者:
Sanches, IUniv Sao Paulo
Escola Politecn Dept Eng Eletron Lab Proc Sinais & Sistemas BR-05508900 Sao Paulo SP Brazil
A matrix method for converting linear prediction coefficients (LPC), or autoregressive coefficients (ARC), to their corresponding normalised autocorrelation coefficients (NAC) is presented. The matrix is an alternativ...
详细信息
A matrix method for converting linear prediction coefficients (LPC), or autoregressive coefficients (ARC), to their corresponding normalised autocorrelation coefficients (NAC) is presented. The matrix is an alternative to the usual step-down procedure to be used in conjunction with the Levinson algorithm when conversion from LPC to NAC is necessary.
A new form of line spectral frequency (FSF), bounded line spectral frequency, is presented. It is shown that the new representation is more efficient than the direct line spectral frequency and the differential line s...
详细信息
A new form of line spectral frequency (FSF), bounded line spectral frequency, is presented. It is shown that the new representation is more efficient than the direct line spectral frequency and the differential line spectral frequency (DLSF). By using a vector measure, the scalar quantisation of tenth-order linear predictive coding (LPC) parameters can be coded at 28 bit/frame with a transparent quantisation quality.
A low-complexity speech recognition method applicable to digital communication networks is proposed. A feature set suitable for speech recognition is obtained from quantised LSP parameters in CELP-type coders without ...
详细信息
A low-complexity speech recognition method applicable to digital communication networks is proposed. A feature set suitable for speech recognition is obtained from quantised LSP parameters in CELP-type coders without reconstructing the speech signals. The authors present the effects of the speech coder on speaker-independent recognition performance. and show that the recognition accuracy of the proposed method is better than that of the recogniser using reconstructed speech signals.
This correspondence describes a method for estimating the parameters of an autoregressive (AR) process from a finite number of noisy measurements, The method uses a modified set of Yule-Walker (YW) equations that lead...
详细信息
This correspondence describes a method for estimating the parameters of an autoregressive (AR) process from a finite number of noisy measurements, The method uses a modified set of Yule-Walker (YW) equations that lead to a quadratic eigenvalue problem that, when solved, gives estimates of the AR parameters and the measurement noise variance.
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft's Whistler system, which is based on concatenative synthesis. Both...
详细信息
ISBN:
(纸本)0780344286
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft's Whistler system, which is based on concatenative synthesis. Both methods are based on a source-filter model, one of them using LPC parameters and the other one using cepstral parameters. The proposed methods achieve high quality prosody modification, retain the characteristics of the donor speaker, allow for spectral manipulation (to reduce spectral discontinuities at unit boundaries), yield compact acoustic inventories and improved voiced fricatives.
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonan...
详细信息
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonant + /i/ + /d/(/Cid/) productions. The duration of the VSS was measured for the first and second formants (F1 and F2) by incorporating a fixed rate-of-change criterion. Results indicated no significant differences in VSS duration according to gender or vowel context. VSS duration based on F1 was significantly longer than F2 VSS duration. The duration of VSS was also found to be correlated to the overall vowel duration in /Cid/ contexts. Discussion focuses on the analysis and application of VSS in acoustic studies of normal and disordered speech production.
Subband-autocorrelation (SBCOR) analysis is a noise robust acoustic analysis based on filter bank and autocorrelation analysis, and aims to extract the periodicities associated with the inverse of the center frequency...
详细信息
Subband-autocorrelation (SBCOR) analysis is a noise robust acoustic analysis based on filter bank and autocorrelation analysis, and aims to extract the periodicities associated with the inverse of the center frequency in a subband. In this paper, it is derived that SBCOR results in the lateral inhibitive weighting (LIW) processing of the power spectrum, and it is shown that the LIW is significantly effective for noise robust acoustic analysis using a DTW word recognizer. An interpretation of the LIW is also described. A flattening technique of the noise spectral envelope using an LPC inverse filter is applied to speech degraded with noise, and DTW word recognition is performed. The idea of this inverse filtering technique comes from weakening the strong periodic components included in noise. The experimental results using a 32th order LPC inverse filter show that the recognition performance of SBCOR (or LIW) is improved for computer room noise.
This paper describes our new mixed excitation linearpredictive (MELP) coder designed for very low bit rate applications. This new coder, through algorithmic improvements and enhanced quantization techniques, produces...
详细信息
This paper describes our new mixed excitation linearpredictive (MELP) coder designed for very low bit rate applications. This new coder, through algorithmic improvements and enhanced quantization techniques, produces better speech quality at 1.7 kb/s than the new U.S. Federal Standard MELP coder at 2.4 kb/s. Key features of the coder are an improved pitch estimation algorithm and a line spectral frequencies (LSF) quantization scheme that requires only 21 bits per frame. With channel coding, this new MELP coder is capable of maintaining good speech quality even in severely degraded channels, at a total bit rate of only 3 kb/s.
This paper presents an algorithm for F1 and F2 formant estimation. The proposed algorithm combines a linearpredictive analysis together with the Mel psychoacoustical perceptual scale. The algorithm was tested for the...
详细信息
This paper presents an algorithm for F1 and F2 formant estimation. The proposed algorithm combines a linearpredictive analysis together with the Mel psychoacoustical perceptual scale. The algorithm was tested for the first 2 formants and produced good performance for male and female speakers, adults and children. In contrast to the classical LPC algorithm which requires variable-order prediction filters to take into account different formant patterns, the proposed algorithm is capable of extracting these formants with a fixed-order prediction filter.
In the last years there has been a growing interest for nonlinear speech models. Several works have been published revealing the better performance of nonlinear techniques, but little attention has been dedicated to t...
详细信息
In the last years there has been a growing interest for nonlinear speech models. Several works have been published revealing the better performance of nonlinear techniques, but little attention has been dedicated to the implementation of the nonlinear model into real applications. This work is focused on the study of the behaviour of a nonlinearpredictive model based on neural nets, in a speech waveform coder. Our novel scheme obtains an improvement in SEGSNR between 1 and 2 dB for an adaptive quantization ranging from 2 to 5 bits.
暂无评论