Several LPC distance measures and statistical tests have been proposed for use in speech processing, the most popular of which is Itakura's log likelihood ratio statistic, and some simple variants thereof. In this...
详细信息
Several LPC distance measures and statistical tests have been proposed for use in speech processing, the most popular of which is Itakura's log likelihood ratio statistic, and some simple variants thereof. In this paper it is shown that these statistics share some undesirable properties. It is argued that there are more tractable and more sensitive measures available including other relevant likelihood ratio statistics. It is also shown that when Itakura's measure is used to compare two estimated LPC vectors, it is not the log likelihood ratio at all, and the true likelihood ratio for these conditions is derived.
This paper presents a real-time robust formant tracking system for speech using a real-time phase equalization-based autoregressive exogenous model (PEAR) with electroglottography (EGG). Although linearpredictive cod...
详细信息
This paper presents a real-time robust formant tracking system for speech using a real-time phase equalization-based autoregressive exogenous model (PEAR) with electroglottography (EGG). Although linear predictive coding (LPC) analysis is a popular method for estimating formant frequencies, it is known that the estimation accuracy for speech with high fundamental frequency F-0 would be degraded since the harmonic structure of the glottal source spectrum deviates more from the Gaussian noise assumption in LPC as its F-0 increases. In contrast, PEAR, which employs phase equalization and LPC with an impulse train as the glottal source signals, estimates formant frequencies robustly even for speech with high F-0. However, PEAR requires higher computational complexity than LPC. In this study, to reduce this computational complexity, a novel formulation of PEAR was derived, which enabled us to implement PEAR for a real-time robust formant tracking system. In addition, since PEAR requires timings of glottal closures, a stable detection method using EGG was devised. We developed the real-time system on a digital signal processor and showed that, for both the synthesized and natural vowels, the proposed method can estimate formant frequencies more robustly than LPC against a wider range of F-0.
A simple method of synthesis gain matching in a linear prediction (LP) vocoder makes use of the LP analysis residual energy. However, poor gain matching is to be expected when a low-frequency formant is in resonance w...
详细信息
A simple method of synthesis gain matching in a linear prediction (LP) vocoder makes use of the LP analysis residual energy. However, poor gain matching is to be expected when a low-frequency formant is in resonance with the voiced excitation impulses. Such large gain errors increase the probability of synthesis filter overflow. A simple improvement of this method is suggested, reducing these large errors substantially. The improvement makes use of the information provided by the derivative of the already synthesized signal. The method can be applied internally or externally to low-complexity real-time speech synthesizers.
A 4.8 kbit/s residual-excited linear prediction coder (RELP) with two subband coded basebands was systematically evaluated in terms of intelligibility and overall quality. Intelligibility degradation due to RELP-codin...
详细信息
A 4.8 kbit/s residual-excited linear prediction coder (RELP) with two subband coded basebands was systematically evaluated in terms of intelligibility and overall quality. Intelligibility degradation due to RELP-coding is found to be 6 percent without transmission errors, an additional 6.4 percent with 1 percent bit errors, and 9.8 percent in 10 dB SNR acoustic background noise. Quality of the RELP coded speech is midway between those of 3 and 4 bit log-PCM and is significantly higher than that of the pitch-excited linear prediction coder.
A new form of Durbin's recursion is described that renders all addressing to be sequential within one iteration of the recursion. Using this technique, Durbin's recursion may be cast into a single repetitively...
详细信息
A new form of Durbin's recursion is described that renders all addressing to be sequential within one iteration of the recursion. Using this technique, Durbin's recursion may be cast into a single repetitively called subroutine with sufficiently simple address arithmetic for single-chip programmable digital signal processors. In a specific implementation, use of this technique reduces program memory by a factor of five while increasing execution time of Durbin's recursion by 50 percent (an increase of 8 to 12 percent of real time), allowing Durbin's recursion to be combined with autocorrelation analysis in a single DSP chip.
This paper describes the pioneering research in the field of speech technology by James L. Flanagan, 2005 IEEE Medal of Honor awardee. Flanagan's work with speech coding heralded a series of advances over the year...
详细信息
This paper describes the pioneering research in the field of speech technology by James L. Flanagan, 2005 IEEE Medal of Honor awardee. Flanagan's work with speech coding heralded a series of advances over the years, including a currently favored technique, linear predictive coding. After graduating from Mississippi State as an electrical engineering major, Flanagan accepted a graduate assistantship in MIT's acoustics lab, which led to his seminal research in voice coding. Flanagan then worked at Bell Telephone Laboratories where he would spend the next 33 years. He climbed steadily up the ranks at Bell Labs, eventually becoming director of the Information Principles Research Laboratory. Among the projects that Flanagan was deeply involved in were the development of automatic speech recognition systems, voice mail, artificial larynx, and packet-switched voice technology.
A transform-LPC hybrid system for real-time coding of picture data has been presented. The LPC has been made adaptive by using a correlation cancellation loop. Three different schemes for coding and reconstructing the...
详细信息
A transform-LPC hybrid system for real-time coding of picture data has been presented. The LPC has been made adaptive by using a correlation cancellation loop. Three different schemes for coding and reconstructing the transform components have been presented and their relative performances have been compared. The MSE and SNR are seen to be comparable to previous work on transform-DPCM hybrid schemes.
We propose a new robust recursive procedure based an WRLS algorithm with VFF and frame-based quadratic classifier for identification of nonstationary AR model of speech. Two versions of the frame-based quadratic class...
详细信息
We propose a new robust recursive procedure based an WRLS algorithm with VFF and frame-based quadratic classifier for identification of nonstationary AR model of speech. Two versions of the frame-based quadratic classifier design procedure are elaborated upon. Experimental results are obtained in analyzing speech signal on voiced and mixed excitation frames.
Results pertaining to the short and long term covariance identification of AR models driven by periodic sequences of possibly unknown phase are presented. The work is motivated by problems relating to LPC analysis of ...
详细信息
Results pertaining to the short and long term covariance identification of AR models driven by periodic sequences of possibly unknown phase are presented. The work is motivated by problems relating to LPC analysis of voiced speech, but results are formulated in general. Short term and asymptotic effects of such inputs on the invertibility of the covariance matrix are considered. Short term criteria with respect to the input for exact solution are established and an asymptotic bound for the case of inexact solution is developed.
A recently : developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.
A recently : developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.
暂无评论