A study of the autocorrelation LPC analysis of speech in additive noise is presented. In the noise-free case it is shown that finite word length implementation of the analysis may produce stable but poor spectral esti...
详细信息
A study of the autocorrelation LPC analysis of speech in additive noise is presented. In the noise-free case it is shown that finite word length implementation of the analysis may produce stable but poor spectral estimates. The beneficial effects of proper preemphasis are reaffirmed in terms of decreased numerical error as well as decreased LPC order needed for a good spectral fit. For the ease of noisy input speech the conditions for severe distortion of the spectral estimate are presented. A proper LPC spectral analysis of speech in additive noise is shown to require a higher order fit than currently used, a more precise implementation, and a more accurate parameter quantization for transmission.
Three different types of pole-zero modeling have been investigated. The main concept lying in these methods is to fit a high order pole predictor to the speech spectrum, and then to decompose the resulting predictor i...
详细信息
Three different types of pole-zero modeling have been investigated. The main concept lying in these methods is to fit a high order pole predictor to the speech spectrum, and then to decompose the resulting predictor into a pole predictor and a zero predictor. To obtain the predictor parameters by Padé approximation, either Trench algorithm or Berlekamp-Massey algorithm can be used for decomposition. In addition to Padé approximation method, the generalized inverse method and the modified Durbin's method are also considered for pole-zero modeling. Preliminary results show that the proposed methods yield better fitting of speech spectral envelope, especially of spectral nulls, than the all-pole model.
The log likelihood measure has been widely used in speech recognition for comparing speech signals. Recently it has been proposed as a measure for assessing the quality of coded speech. In this paper we present an int...
详细信息
The log likelihood measure has been widely used in speech recognition for comparing speech signals. Recently it has been proposed as a measure for assessing the quality of coded speech. In this paper we present an interpretation of the log likelihood ratio measure within the theoretical framework of a waveform coder distortion model.
Four automatic speaker recognition techniques were investigated with a contain speech data base to determine their effectiveness in a text independent mode. These four techniques used the correlation of short and long...
详细信息
Four automatic speaker recognition techniques were investigated with a contain speech data base to determine their effectiveness in a text independent mode. These four techniques used the correlation of short and long term spectral averages, cepstral measurements of long term spectral averages, orthogonal linear prediction of the speech waveform, and long term average LPC reflection coefficients combined with pitch and overall power. The results of this study indicate that LPC derived parameters perform better than do those derived from cepstral and spectral data. Recognition accuracies of 95% and 93% were obtained for LPC based techniques with 13 seconds of unknown speech. The corresponding recognition accuracies for the cepstral and spectral based systems were 79% and 54% respectively.
Results of past studies on the quality problems of LPC speech are reviewed. The causes of the quality problems are found to lie within the basic model assumptions as well as inaccuracies in LPC analysis and errors int...
详细信息
Results of past studies on the quality problems of LPC speech are reviewed. The causes of the quality problems are found to lie within the basic model assumptions as well as inaccuracies in LPC analysis and errors introduced in pitch and voicing detection and parameter quantization. Experiments discussed here show that LPC synthesis is generally very close to natural speech in the high frequency region and that most of the degradation is in the low frequency reglon (approximately less than 1500 Hz).
Due to the increasing sophisitication and decreasing cost of digital hardware, an all-digital speech communication network is being considered for implementation. Subject to bandwidth limitations, present plans call f...
详细信息
Due to the increasing sophisitication and decreasing cost of digital hardware, an all-digital speech communication network is being considered for implementation. Subject to bandwidth limitations, present plans call for both 16 Kilobit Continuously Variable Slope Delta Modulation (CVSD) as well as 2.4 Kilobit linear predictive coding (LPC) terminals. Hence the conversion of CVSD to LPC and vice versa arises naturally in this environment. This paper will focus on the CVSD to LPC conversion problem. After a brief discussion of the component system environments, a structure for the format conversion system will be proposed. Since previous work has shown that pitch and voicing decisions can be made on the CVSD observations, the presentation will focus on the identification of LPC predictor coefficients from the noisily observed CVSD data. The remainder of the paper centers on this and the coefficient correction problem, and a new class of linear estimators is shown to yield conditionally unbiased estimates of the LPC coefficients in both noisy and noiseless environments.
The realization of a speech analyzer plus an LPC synthesizer in a single chip signal processing microprocessor is described. The chip is able to process both algorithms in real time to create an interactive voice anal...
详细信息
The realization of a speech analyzer plus an LPC synthesizer in a single chip signal processing microprocessor is described. The chip is able to process both algorithms in real time to create an interactive voice analyzer/response system operating under the control of a microprocessor and with the LPC speech data stored in a ROM. The chip is a 16 bit microprocessor specially architectured for signal processing. It features all single cycle instructions with a 300nsec cycle time, and a 12 × 12 bit parallel multiplier pipelined to operate in a single cycle. It can be programmed to perform a wide variety of signal processing functions including speech processing.
A composite-Gaussian source model for speech was suggested at ICASSP-79. Based upon this model a voice/unvoiced detector is derived. This detector is an approximation to a maximum-a-posteriori sequence estimator and e...
详细信息
A composite-Gaussian source model for speech was suggested at ICASSP-79. Based upon this model a voice/unvoiced detector is derived. This detector is an approximation to a maximum-a-posteriori sequence estimator and employs the Viterbi algorithm. This paper deals with the performance of this detector with real speech.
A new method of image coding by autoregressive (AR) synthesis is presented. The physics of image formation suggests that an image may be considered as a power spectrum. Using this formulation a Cosine transform of the...
详细信息
A new method of image coding by autoregressive (AR) synthesis is presented. The physics of image formation suggests that an image may be considered as a power spectrum. Using this formulation a Cosine transform of the sampled image is shown to yield a set of autocorrelations. These are used to find an equivalent AR model whose parameters are encoded for transmission. Compared to conventional Cosine transform coding, this method is shown to give superior resolution and is shown to suppress the "block-effects" present in block-by-block transform coding methods. Distinction between this method and linear predictive coding (LPC) used for speech data compression is made. Extensions and examples for two dimensional images are given.
暂无评论