This brief describes the use of wavelet analysis in the development of a Japanese text-to-speech (TTS) system for personal computers. The quality of synthesized speech is one of the most important features of any TTS ...
详细信息
This brief describes the use of wavelet analysis in the development of a Japanese text-to-speech (TTS) system for personal computers. The quality of synthesized speech is one of the most important features of any TTS system. Synthesis methods which are based on manipulation of the speech signal spectrum (e.g., linear predictive coding synthesis and formant synthesis) produce comprehensible but unnatural sounding output. The lack of naturalness commonly associated with these methods results from the use of oversimplified speech models, small synthesis unit inventories, and poor handling of text parsing for prosody control. We developed four new technologies to overcome these difficulties and improve the quality of output from TTS systems: accurate pitch mark determination by wavelet analysis, speech waveform generation using a modified time domain pitch synchronous overlap-add method, speech synthesis unit selection using a context dependent clustering method, and efficient prosody control using a 3-phrase parser. All four technologies will be described;however, those which rely on wavelet techniques will be emphasized.
作者:
Sanches, IUniv Sao Paulo
Escola Politecn Dept Eng Eletron Lab Proc Sinais & Sistemas BR-05508900 Sao Paulo SP Brazil
A matrix method for converting linear prediction coefficients (LPC), or autoregressive coefficients (ARC), to their corresponding normalised autocorrelation coefficients (NAC) is presented. The matrix is an alternativ...
详细信息
A matrix method for converting linear prediction coefficients (LPC), or autoregressive coefficients (ARC), to their corresponding normalised autocorrelation coefficients (NAC) is presented. The matrix is an alternative to the usual step-down procedure to be used in conjunction with the Levinson algorithm when conversion from LPC to NAC is necessary.
A new form of line spectral frequency (FSF), bounded line spectral frequency, is presented. It is shown that the new representation is more efficient than the direct line spectral frequency and the differential line s...
详细信息
A new form of line spectral frequency (FSF), bounded line spectral frequency, is presented. It is shown that the new representation is more efficient than the direct line spectral frequency and the differential line spectral frequency (DLSF). By using a vector measure, the scalar quantisation of tenth-order linear predictive coding (LPC) parameters can be coded at 28 bit/frame with a transparent quantisation quality.
A low-complexity speech recognition method applicable to digital communication networks is proposed. A feature set suitable for speech recognition is obtained from quantised LSP parameters in CELP-type coders without ...
详细信息
A low-complexity speech recognition method applicable to digital communication networks is proposed. A feature set suitable for speech recognition is obtained from quantised LSP parameters in CELP-type coders without reconstructing the speech signals. The authors present the effects of the speech coder on speaker-independent recognition performance. and show that the recognition accuracy of the proposed method is better than that of the recogniser using reconstructed speech signals.
This correspondence describes a method for estimating the parameters of an autoregressive (AR) process from a finite number of noisy measurements, The method uses a modified set of Yule-Walker (YW) equations that lead...
详细信息
This correspondence describes a method for estimating the parameters of an autoregressive (AR) process from a finite number of noisy measurements, The method uses a modified set of Yule-Walker (YW) equations that lead to a quadratic eigenvalue problem that, when solved, gives estimates of the AR parameters and the measurement noise variance.
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft's Whistler system, which is based on concatenative synthesis. Both...
详细信息
ISBN:
(纸本)0780344286
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft's Whistler system, which is based on concatenative synthesis. Both methods are based on a source-filter model, one of them using LPC parameters and the other one using cepstral parameters. The proposed methods achieve high quality prosody modification, retain the characteristics of the donor speaker, allow for spectral manipulation (to reduce spectral discontinuities at unit boundaries), yield compact acoustic inventories and improved voiced fricatives.
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonan...
详细信息
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonant + /i/ + /d/(/Cid/) productions. The duration of the VSS was measured for the first and second formants (F1 and F2) by incorporating a fixed rate-of-change criterion. Results indicated no significant differences in VSS duration according to gender or vowel context. VSS duration based on F1 was significantly longer than F2 VSS duration. The duration of VSS was also found to be correlated to the overall vowel duration in /Cid/ contexts. Discussion focuses on the analysis and application of VSS in acoustic studies of normal and disordered speech production.
Subband-autocorrelation (SBCOR) analysis is a noise robust acoustic analysis based on filter bank and autocorrelation analysis, and aims to extract the periodicities associated with the inverse of the center frequency...
详细信息
Subband-autocorrelation (SBCOR) analysis is a noise robust acoustic analysis based on filter bank and autocorrelation analysis, and aims to extract the periodicities associated with the inverse of the center frequency in a subband. In this paper, it is derived that SBCOR results in the lateral inhibitive weighting (LIW) processing of the power spectrum, and it is shown that the LIW is significantly effective for noise robust acoustic analysis using a DTW word recognizer. An interpretation of the LIW is also described. A flattening technique of the noise spectral envelope using an LPC inverse filter is applied to speech degraded with noise, and DTW word recognition is performed. The idea of this inverse filtering technique comes from weakening the strong periodic components included in noise. The experimental results using a 32th order LPC inverse filter show that the recognition performance of SBCOR (or LIW) is improved for computer room noise.
This paper describes our new mixed excitation linearpredictive (MELP) coder designed for very low bit rate applications. This new coder, through algorithmic improvements and enhanced quantization techniques, produces...
详细信息
This paper describes our new mixed excitation linearpredictive (MELP) coder designed for very low bit rate applications. This new coder, through algorithmic improvements and enhanced quantization techniques, produces better speech quality at 1.7 kb/s than the new U.S. Federal Standard MELP coder at 2.4 kb/s. Key features of the coder are an improved pitch estimation algorithm and a line spectral frequencies (LSF) quantization scheme that requires only 21 bits per frame. With channel coding, this new MELP coder is capable of maintaining good speech quality even in severely degraded channels, at a total bit rate of only 3 kb/s.
This paper presents an algorithm for F1 and F2 formant estimation. The proposed algorithm combines a linearpredictive analysis together with the Mel psychoacoustical perceptual scale. The algorithm was tested for the...
详细信息
This paper presents an algorithm for F1 and F2 formant estimation. The proposed algorithm combines a linearpredictive analysis together with the Mel psychoacoustical perceptual scale. The algorithm was tested for the first 2 formants and produced good performance for male and female speakers, adults and children. In contrast to the classical LPC algorithm which requires variable-order prediction filters to take into account different formant patterns, the proposed algorithm is capable of extracting these formants with a fixed-order prediction filter.
暂无评论