The authors present a new method of modeling the excitation of the linear predictive coding (LPC) synthesis filter at low and medium bit rates. For the speech segments with regular patterns, the excitation is composed...
详细信息
The authors present a new method of modeling the excitation of the linear predictive coding (LPC) synthesis filter at low and medium bit rates. For the speech segments with regular patterns, the excitation is composed of two sequences of pulses. The first sequence is generated in a way similar to the classical physical model that consists of a glottal filter with thinned coefficients driven by a set of pitch pulses. Both the glottal function and pitch pulses are determined using the analysis-by-synthesis technique with the mean square criterion. The auxiliary sequence consists of a few pulses to supplement the first sequence for further reducing the mean square error. For unvoiced speech segments, multipulse excitation is simply used to drive the synthesis filter. Based on real speech analysis, the model has a gain on signal-to-noise ratio (SNR) of 2-3 dB for voiced segments over the multipulse LPC using 0.8-2.5 pulses/ms.< >
The intraframe correlation properties of line spectrum pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. The important nonuniform statistical characteristics ...
详细信息
ISBN:
(纸本)0818679190
The intraframe correlation properties of line spectrum pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. The important nonuniform statistical characteristics of LSP frequencies are investigated. Based upon this nonuniform property the neural network based techniques for generating the transform vectors via system training are studied. Using the principal component analysis (PCA) network to decorrelate LSP coefficients, we show that these new approaches lead to as good or better distortion as compared to other methods for speech analysis-synthesis.
At learning, LPC is used to get the reference poles corresponding to the words. During the recognition, the order of the filtering is variable and imposed by the dictionary. The distance between an input speech window...
详细信息
At learning, LPC is used to get the reference poles corresponding to the words. During the recognition, the order of the filtering is variable and imposed by the dictionary. The distance between an input speech window and a dictionary speech window is computed with a method near Itakura's method but using a series of two-order inverse filtering. An improved dynamic programming is used allowing parallel computation for several words.
Most published evaluations of LPC systems use only one or two speakers. Since LPC quality and intelligibility are known to depend on the speaker, this is an inadequate test of a synthesis system. We recorded eight men...
详细信息
Most published evaluations of LPC systems use only one or two speakers. Since LPC quality and intelligibility are known to depend on the speaker, this is an inadequate test of a synthesis system. We recorded eight men and nine women chosen from a speech data base of 81 speakers who were independently rated by two phoneticians for the presence or absence of the following voice characteristics: nasality, harshness, creak, whisper, and pitch extreme. The 17 talkers represented a balanced sample of strong positives or negatives of the five voice characteristics. Each speaker was recorded on one fifty word set from the Modified Rhyme Test. Monosyllabic word intelligibility tests were administered to 88 listeners (with four listeners per speaker set). Results from the intelligibility tests for different speakers show that vocal characteristics and resultant LPC quality are linked. Nasality and whisper are the most strongly correlated with a decreased LPC intelligibility.
Code excited linear predictor coders hold promise to achieve high quality speech at low bit rates. We propose a ternary excitation based CELP coder with a new structure to achieve toll quality speech at 4 kbps. Speech...
详细信息
Code excited linear predictor coders hold promise to achieve high quality speech at low bit rates. We propose a ternary excitation based CELP coder with a new structure to achieve toll quality speech at 4 kbps. Speech quality is maintained by allocating more bits for the codebook index allowing for larger codebooks which provide better speech quality as quantization levels increase. To allocate more bits for the codebook index a backward adaptive 10-th order LPC predictor is used. Regular structure of the lattice codebook and convexity of error surface have been exploited to greatly improve the efficiency of the search algorithm. The storage requirement of the codebook is eliminated by transmitting the position of three weights used in generating the ternary codebook instead of the codebook index. Speech quality obtained using the new CELP structure is studied and results compared with the LBG and the Gaussian codebooks.
A monolithic CCD adaptive filter chip is described which implements the Widrow-Hoff "clipped-data" LMS adaptive algorithm. The chip can be used as a pre-filter noise canceller, analysis filter, or pre-whiten...
详细信息
A monolithic CCD adaptive filter chip is described which implements the Widrow-Hoff "clipped-data" LMS adaptive algorithm. The chip can be used as a pre-filter noise canceller, analysis filter, or pre-whitener for a pitch extractor in linear prediction coding (LPC) voice bandwidth reduction systems.
Cellular phone network speech quality monitoring is a regular task performed by the cellular service providers. Objective speech quality measures are needed in such tasks to provide a reasonably accurate estimate of s...
详细信息
Cellular phone network speech quality monitoring is a regular task performed by the cellular service providers. Objective speech quality measures are needed in such tasks to provide a reasonably accurate estimate of subjective quality of the network. We performed an experiment to collect real distorted data, conducted a survey to obtain subjective quality measure of the collected speech samples and studied the statistical correlation of 32 objective speech quality measures with the subjective measures. Four of the objective measures were found to be good. Synchronization was found to be important.
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existi...
详细信息
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existing speech coders provides high quality for speech signals, it has significant performance limitations for, for example, background noise. The coder presented here employs a novel adaptive gain coding technique using energy matching in combination with a traditional waveform matching criterion providing high quality for both speech and background noise. The coder has a basic structure similar to that of the 7.4 kbit/s D-AMPS EFR coder, with a 10 th order LPC, high resolution adaptive codebook and a 4 pulse algebraic codebook. The performance for speech signals is equivalent to or better than that of state-of-the-art 8 kbit/s coders, while for background noise conditions the performance is significantly improved.
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals wit...
详细信息
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals with modified tones were presented to hearing-impaired Chinese listeners who were asked to identify four alternative Mandarin words. Employing this method, it was found that modified speech with enhanced tone contrast contributed moderate gains in the percentage correct word identification when compared to unmodified speech, while reducing tone contrast generally reduced the percentage correct identification. These findings therefore offer support to the assertion that a hearing aid with tone modifications is indeed effective for hearing-impaired Chinese.
The paper investigates the use of neural networks in recognizing the phonation of the speech sounds. The proposed method classifies the Malay plosive sounds of adults and children based on phonation in a speaker-indep...
详细信息
The paper investigates the use of neural networks in recognizing the phonation of the speech sounds. The proposed method classifies the Malay plosive sounds of adults and children based on phonation in a speaker-independent manner. The proposed method achieves encouraging result with an average accuracy of 98%.
暂无评论