n this work we address the problem of all pole spectral envelope estimation for speech signals. The currently widely used all pole spectral envelope model suffers from well-known systematic errors and more severely fr...
详细信息
n this work we address the problem of all pole spectral envelope estimation for speech signals. The currently widely used all pole spectral envelope model suffers from well-known systematic errors and more severely from model order mismatch. We will propose a procedure to first establish a band limited interpolation of the observed spectrum using a recently rediscovered true envelope estimator and then using the band limited envelope to derive an all pole envelope model named TE-LPC . The band-limited envelope that is used to derive the all pole envelope model reduces the problem of the unknown all pole model order. For the experimental investigation we propose a new perceptually motivated residual spectral peak flatness measure. The experimental results demonstrate that the proposed method significantly increases the spectral flatness for the perceptually especially important low order harmonics of voiced utterances
CELP coders commonly use line spectral pairs (LSP) to represent linear prediction parameters, giving stable filters and efficient coding. However, manipulation of LSPs can alter frequencies within the represented sign...
详细信息
CELP coders commonly use line spectral pairs (LSP) to represent linear prediction parameters, giving stable filters and efficient coding. However, manipulation of LSPs can alter frequencies within the represented signals. This paper describes two computationally efficient LSP-based processing methods designed to enhance the intelligibility of speech degraded by acoustic interference.
The intraframe correlation properties of line spectrum pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. The important nonuniform statistical characteristics ...
详细信息
ISBN:
(纸本)0818679190
The intraframe correlation properties of line spectrum pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. The important nonuniform statistical characteristics of LSP frequencies are investigated. Based upon this nonuniform property the neural network based techniques for generating the transform vectors via system training are studied. Using the principal component analysis (PCA) network to decorrelate LSP coefficients, we show that these new approaches lead to as good or better distortion as compared to other methods for speech analysis-synthesis.
The authors present a new method of modeling the excitation of the linear predictive coding (LPC) synthesis filter at low and medium bit rates. For the speech segments with regular patterns, the excitation is composed...
详细信息
The authors present a new method of modeling the excitation of the linear predictive coding (LPC) synthesis filter at low and medium bit rates. For the speech segments with regular patterns, the excitation is composed of two sequences of pulses. The first sequence is generated in a way similar to the classical physical model that consists of a glottal filter with thinned coefficients driven by a set of pitch pulses. Both the glottal function and pitch pulses are determined using the analysis-by-synthesis technique with the mean square criterion. The auxiliary sequence consists of a few pulses to supplement the first sequence for further reducing the mean square error. For unvoiced speech segments, multipulse excitation is simply used to drive the synthesis filter. Based on real speech analysis, the model has a gain on signal-to-noise ratio (SNR) of 2-3 dB for voiced segments over the multipulse LPC using 0.8-2.5 pulses/ms.< >
A new spectral distance measure is defined by inserting a multiplicative frequency weighting term into the conventional Itakura-Saito measure. Then the weighting function is a simple one pole function, then the minimi...
详细信息
A new spectral distance measure is defined by inserting a multiplicative frequency weighting term into the conventional Itakura-Saito measure. Then the weighting function is a simple one pole function, then the minimization of the distance between a signal spectrum and an arbitrary N-pole filter results in a set of linear equations that is symmetric and solvable by Cholesky decomposition. When the weighting function is a multiple pole function, the resulting spectral distance minimization produces a set of nonlinear algebraic equations, but fortunately a simple method exists for obtaining an approximate solution which can be refined using the Newton- Raphson method. Results of some preliminary trials in applying the technique to LPC vocoding are described.
At learning, LPC is used to get the reference poles corresponding to the words. During the recognition, the order of the filtering is variable and imposed by the dictionary. The distance between an input speech window...
详细信息
At learning, LPC is used to get the reference poles corresponding to the words. During the recognition, the order of the filtering is variable and imposed by the dictionary. The distance between an input speech window and a dictionary speech window is computed with a method near Itakura's method but using a series of two-order inverse filtering. An improved dynamic programming is used allowing parallel computation for several words.
Cellular phone network speech quality monitoring is a regular task performed by the cellular service providers. Objective speech quality measures are needed in such tasks to provide a reasonably accurate estimate of s...
详细信息
Cellular phone network speech quality monitoring is a regular task performed by the cellular service providers. Objective speech quality measures are needed in such tasks to provide a reasonably accurate estimate of subjective quality of the network. We performed an experiment to collect real distorted data, conducted a survey to obtain subjective quality measure of the collected speech samples and studied the statistical correlation of 32 objective speech quality measures with the subjective measures. Four of the objective measures were found to be good. Synchronization was found to be important.
Most published evaluations of LPC systems use only one or two speakers. Since LPC quality and intelligibility are known to depend on the speaker, this is an inadequate test of a synthesis system. We recorded eight men...
详细信息
Most published evaluations of LPC systems use only one or two speakers. Since LPC quality and intelligibility are known to depend on the speaker, this is an inadequate test of a synthesis system. We recorded eight men and nine women chosen from a speech data base of 81 speakers who were independently rated by two phoneticians for the presence or absence of the following voice characteristics: nasality, harshness, creak, whisper, and pitch extreme. The 17 talkers represented a balanced sample of strong positives or negatives of the five voice characteristics. Each speaker was recorded on one fifty word set from the Modified Rhyme Test. Monosyllabic word intelligibility tests were administered to 88 listeners (with four listeners per speaker set). Results from the intelligibility tests for different speakers show that vocal characteristics and resultant LPC quality are linked. Nasality and whisper are the most strongly correlated with a decreased LPC intelligibility.
Code excited linear predictor coders hold promise to achieve high quality speech at low bit rates. We propose a ternary excitation based CELP coder with a new structure to achieve toll quality speech at 4 kbps. Speech...
详细信息
Code excited linear predictor coders hold promise to achieve high quality speech at low bit rates. We propose a ternary excitation based CELP coder with a new structure to achieve toll quality speech at 4 kbps. Speech quality is maintained by allocating more bits for the codebook index allowing for larger codebooks which provide better speech quality as quantization levels increase. To allocate more bits for the codebook index a backward adaptive 10-th order LPC predictor is used. Regular structure of the lattice codebook and convexity of error surface have been exploited to greatly improve the efficiency of the search algorithm. The storage requirement of the codebook is eliminated by transmitting the position of three weights used in generating the ternary codebook instead of the codebook index. Speech quality obtained using the new CELP structure is studied and results compared with the LBG and the Gaussian codebooks.
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existi...
详细信息
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existing speech coders provides high quality for speech signals, it has significant performance limitations for, for example, background noise. The coder presented here employs a novel adaptive gain coding technique using energy matching in combination with a traditional waveform matching criterion providing high quality for both speech and background noise. The coder has a basic structure similar to that of the 7.4 kbit/s D-AMPS EFR coder, with a 10 th order LPC, high resolution adaptive codebook and a 4 pulse algebraic codebook. The performance for speech signals is equivalent to or better than that of state-of-the-art 8 kbit/s coders, while for background noise conditions the performance is significantly improved.
暂无评论