At learning, LPC is used to get the reference poles corresponding to the words. During the recognition, the order of the filtering is variable and imposed by the dictionary. The distance between an input speech window...
详细信息
At learning, LPC is used to get the reference poles corresponding to the words. During the recognition, the order of the filtering is variable and imposed by the dictionary. The distance between an input speech window and a dictionary speech window is computed with a method near Itakura's method but using a series of two-order inverse filtering. An improved dynamic programming is used allowing parallel computation for several words.
n this work we address the problem of all pole spectral envelope estimation for speech signals. The currently widely used all pole spectral envelope model suffers from well-known systematic errors and more severely fr...
详细信息
n this work we address the problem of all pole spectral envelope estimation for speech signals. The currently widely used all pole spectral envelope model suffers from well-known systematic errors and more severely from model order mismatch. We will propose a procedure to first establish a band limited interpolation of the observed spectrum using a recently rediscovered true envelope estimator and then using the band limited envelope to derive an all pole envelope model named TE-LPC . The band-limited envelope that is used to derive the all pole envelope model reduces the problem of the unknown all pole model order. For the experimental investigation we propose a new perceptually motivated residual spectral peak flatness measure. The experimental results demonstrate that the proposed method significantly increases the spectral flatness for the perceptually especially important low order harmonics of voiced utterances
A new spectral distance measure is defined by inserting a multiplicative frequency weighting term into the conventional Itakura-Saito measure. Then the weighting function is a simple one pole function, then the minimi...
详细信息
A new spectral distance measure is defined by inserting a multiplicative frequency weighting term into the conventional Itakura-Saito measure. Then the weighting function is a simple one pole function, then the minimization of the distance between a signal spectrum and an arbitrary N-pole filter results in a set of linear equations that is symmetric and solvable by Cholesky decomposition. When the weighting function is a multiple pole function, the resulting spectral distance minimization produces a set of nonlinear algebraic equations, but fortunately a simple method exists for obtaining an approximate solution which can be refined using the Newton- Raphson method. Results of some preliminary trials in applying the technique to LPC vocoding are described.
A monolithic CCD adaptive filter chip is described which implements the Widrow-Hoff "clipped-data" LMS adaptive algorithm. The chip can be used as a pre-filter noise canceller, analysis filter, or pre-whiten...
详细信息
A monolithic CCD adaptive filter chip is described which implements the Widrow-Hoff "clipped-data" LMS adaptive algorithm. The chip can be used as a pre-filter noise canceller, analysis filter, or pre-whitener for a pitch extractor in linear prediction coding (LPC) voice bandwidth reduction systems.
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existi...
详细信息
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existing speech coders provides high quality for speech signals, it has significant performance limitations for, for example, background noise. The coder presented here employs a novel adaptive gain coding technique using energy matching in combination with a traditional waveform matching criterion providing high quality for both speech and background noise. The coder has a basic structure similar to that of the 7.4 kbit/s D-AMPS EFR coder, with a 10 th order LPC, high resolution adaptive codebook and a 4 pulse algebraic codebook. The performance for speech signals is equivalent to or better than that of state-of-the-art 8 kbit/s coders, while for background noise conditions the performance is significantly improved.
The paper investigates the use of neural networks in recognizing the phonation of the speech sounds. The proposed method classifies the Malay plosive sounds of adults and children based on phonation in a speaker-indep...
详细信息
The paper investigates the use of neural networks in recognizing the phonation of the speech sounds. The proposed method classifies the Malay plosive sounds of adults and children based on phonation in a speaker-independent manner. The proposed method achieves encouraging result with an average accuracy of 98%.
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals wit...
详细信息
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals with modified tones were presented to hearing-impaired Chinese listeners who were asked to identify four alternative Mandarin words. Employing this method, it was found that modified speech with enhanced tone contrast contributed moderate gains in the percentage correct word identification when compared to unmodified speech, while reducing tone contrast generally reduced the percentage correct identification. These findings therefore offer support to the assertion that a hearing aid with tone modifications is indeed effective for hearing-impaired Chinese.
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the m...
详细信息
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates.
This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder...
详细信息
This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder for such decoding. The optimal decoder can be implemented as a table look-up decoder, however the size of the decoder codebook grows very fast with the size of the collection of utilized indices. This leads us to introduce a method for storing an approximation to the set of optimal decoder vectors, based on linear mapping of a block code vector quantization. In this approach a heavily reduced set of parameters is employed to represent the codebook. Furthermore, we illustrate that the proposed scheme has an interpretation as nonlinearpredictive quantization. Numerical results indicate high gain over memoryless coding and memory quantization based on linear predictive coding. The results also show that the sub-optimal approach performs close to the optimal.
Cellular phone network speech quality monitoring is a regular task performed by the cellular service providers. Objective speech quality measures are needed in such tasks to provide a reasonably accurate estimate of s...
详细信息
Cellular phone network speech quality monitoring is a regular task performed by the cellular service providers. Objective speech quality measures are needed in such tasks to provide a reasonably accurate estimate of subjective quality of the network. We performed an experiment to collect real distorted data, conducted a survey to obtain subjective quality measure of the collected speech samples and studied the statistical correlation of 32 objective speech quality measures with the subjective measures. Four of the objective measures were found to be good. Synchronization was found to be important.
暂无评论