The exact transformation from formant frequencies and bandwidths to reflection coefficients is a complex and time consuming process. A simple approximate transformation may be performed using a table look-up procedure...
详细信息
The exact transformation from formant frequencies and bandwidths to reflection coefficients is a complex and time consuming process. A simple approximate transformation may be performed using a table look-up procedure with linear interpolation. Experiments have been conducted using a four formant model, each formant being represented by a complex conjugate pole pair. The formant bandwidths and the frequency of the fourth formant are assigned constant values. Informal listening has shown that a table with as few as 126 16-bit entries is sufficient to perform this transformation without significant degradation of speech quality.
This paper presents the results of our investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 9.6 kb/s and for channel bit-error rates ...
详细信息
This paper presents the results of our investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 9.6 kb/s and for channel bit-error rates of up to 1%. Important among these aspects are: baseband width, coding of baseband, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder.
Preservation of both the spectral distribution and the periodicity of speech signals are essential in speech processing. This paper describes a method of speech coding in a high ambient noise environment and shows tha...
详细信息
Preservation of both the spectral distribution and the periodicity of speech signals are essential in speech processing. This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used. Also reported in this paper comparisons of several pitch extraction methods with extensive experimental data, based on which a pitch extraction method suited for noisy speech signals is proposed.
A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analys...
详细信息
A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.
Covariance analysis as a least squares approach for accurately performing glottal inverse filtering from the acoustic speech waveform is discussed. Best results are obtained by situating the analysis window within a s...
详细信息
Covariance analysis as a least squares approach for accurately performing glottal inverse filtering from the acoustic speech waveform is discussed. Best results are obtained by situating the analysis window within a stable closed glottis interval. Based on a linear model of speech production, it is shown that both the moment of glottal closure and opening can be determined from the normalized total squared error with proper choices of analysis window length and filter order. Results from actual speech are presented to illustrate the technique.
Several distance measures have been proposed for comparing sets of LPC coefficients. The most popular one has been the "log likelihood ratio" proposed by Itakura [1]. In this paper we discuss this measure (s...
详细信息
Several distance measures have been proposed for comparing sets of LPC coefficients. The most popular one has been the "log likelihood ratio" proposed by Itakura [1]. In this paper we discuss this measure (strictly speaking, a somewhat generalized version of it) from both a theoretical and a practical point of view. We derive its statistical properties both when the reference vector is known and when it is estimated from the data. We also show how these properties are affected by windowing, additive noise, and preemphasis. We present results of extensive simulations in support of the theoretical predictions. Finally, we argue that de Souza's [2] recent criticism of this measure is unjustified.
A recently : developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.
A recently : developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.
This paper presents some possible acoustic feature differences between natural and synthesized speech. Sentences spoken in a natural adult male voice and synthesized on VOTRAX ML-1 Speech Synthesizer were recorded in ...
详细信息
This paper presents some possible acoustic feature differences between natural and synthesized speech. Sentences spoken in a natural adult male voice and synthesized on VOTRAX ML-1 Speech Synthesizer were recorded in a sound proof booth. The recorded sentences were classified into voiced, unvoiced and silence regions contained in these sentences. Parameters like the zerocrossing, linear prediction coefficient and energy were used in making the classification. The results obtained indicate that the synthesized speech tends to contain more unvoicing than the natural speech. The classification accuracy was 99% in the natural speech and 85% in the synthesized speech.
A new distance measure based on the derivative of linear prediction (LP) phase spectrum is proposed for comparison of speech spectra. Relationships among several distance measures based on the linear prediction coeffi...
详细信息
A new distance measure based on the derivative of linear prediction (LP) phase spectrum is proposed for comparison of speech spectra. Relationships among several distance measures based on the linear prediction coefficients (LPCs) are discussed. The advantages of the new measure and an efficient method of computing it are also discussed.
暂无评论