A novel linear prediction (LP) model-bused coding technique is presented. where the advantages of multiple non-orthogonal domain representations of LP coefficients and the residuals are exploited in conjunction with v...
详细信息
A novel linear prediction (LP) model-bused coding technique is presented. where the advantages of multiple non-orthogonal domain representations of LP coefficients and the residuals are exploited in conjunction with vector quantisation. The proposed technique is applied to speech signals and the resulting performance improvement is clearly demonstrated in terms of the reconstruction quality.
In this paper, the application of artificial neural network clasifier to resolve pest birds in agricultural areas as a part of a comprehensive system of protection against vermin is demonstrated. Firstly, the idea of ...
详细信息
In this paper, the application of artificial neural network clasifier to resolve pest birds in agricultural areas as a part of a comprehensive system of protection against vermin is demonstrated. Firstly, the idea of the whole system is outlined. Then, the method of recognition is described, the process of artificial neural network design is illustrated and the classifier is validated using data gathered in the fields. Eventually, the results are compared to similar works.
This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-lo...
详细信息
This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance. Cepstrum coefficients are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed. The time functions are expanded by orthogonal polynomial representations and, after a feature selection procedure, brought into time registration with stored reference functions to calculate the overall distance. This is accomplished by a new time warping method using a dynamic programming technique. A decision is made to accept or reject an identity claim, based on the overall distance. Reference functions and decision thresholds are updated for each customer. Several sets of experimental utterances were used for the evaluation of the system, which include male and female utterances recorded over a conventional telephone connection. Male utterances processed by ADPCM and LPC coding systems were used together with unprocessed utterances. Results of the experiment indicate that verification error rate of one percent or less can be obtained even if the reference and test utterances are subjected to different transmission conditions.
The performance of the Burg method for speech analysis is compared to the autocorrelation and covariance methods. The criterion of goodness is the accuracy of the spectral approximation, filter stability, windowing re...
详细信息
The performance of the Burg method for speech analysis is compared to the autocorrelation and covariance methods. The criterion of goodness is the accuracy of the spectral approximation, filter stability, windowing requirements, data frame length, and spectral resolution. A mathematical comparison is presented for the simple first-order signal. Spectral comparisons are presented for a second-order speech-like signal. Real speech synthesis using the analysis results of the autocorrelation and Burg methods are subjectively compared. The results do not find any justification for preferring the computationally more complex Burg method.
Automatic speech recognition experiments are described in which several popular preprocessing and classification strategies are compared. Preprocessing is done either by linearpredictive analysis or by bandpass filte...
详细信息
Automatic speech recognition experiments are described in which several popular preprocessing and classification strategies are compared. Preprocessing is done either by linearpredictive analysis or by bandpass filtering. The two approaches are shown to produce similar recognition scores. The classifier uses either linear time stretching or dynamic programming to achieve time alignment. It is shown that dynamic programming is of major importance for recognition of polysyllabic words. The speech is compressed into a quasi-phoneme character string or preserved uncompressed. Best results are obtained with uncompressed data, using nonlinear time registration for multisyllabic words.
Efficient scalar quantization tables for LPC k-parameters were developed using a distortion measure based on just-noticeable-differences (JND's) in formant parameters of the speech spectrum envelope. Forty percent...
详细信息
Efficient scalar quantization tables for LPC k-parameters were developed using a distortion measure based on just-noticeable-differences (JND's) in formant parameters of the speech spectrum envelope. Forty percent fewer bits were required than the 41/frame used in conventional approaches. An empirical technique was developed for relating perturbations in k-parameters and formant parameters. New estimates were obtained for the values of the formant JND's: they are about four times the steady-state values reported by Flanagan [6] and increase sharply above approximately 1.5 kHz.
A finite-state vector quantizer (FSVQ) is a switched vector quantizer where the sequence of quantizers selected by the encoder can be tracked by the decoder. It can be viewed as an adaptive vector quantizer with backw...
详细信息
A finite-state vector quantizer (FSVQ) is a switched vector quantizer where the sequence of quantizers selected by the encoder can be tracked by the decoder. It can be viewed as an adaptive vector quantizer with backward estimation, a vector generalization of an AQB system. Recently a family of algorithms for the design of FSVQ's for waveform coding application has been introduced. These algorithms first design an initial set of vector quantizers together with a next-state function giving the rule by which the next quantizer is selected. The codebooks of this initial FSVQ are then iteratively improved by a natural extension of the usual memoryless vector quantizer design algorithm. The next-state function, however, is not modified from its initial form. In this paper we present two extensions of the FSVQ design algorithms. First, the algorithm for FSVQ design for waveform coders is extended to FSVQ design of linearpredictive coded (LPC) speech parameter vectors using an Itakura-Saito distortion measure. Second, we introduce a new technique for the iterative improvement of the next-state function based on an algorithm from adaptive stochastic automata theory. The design algorithms are simulated for an LPC FSVQ and the results are compared with each other and to ordinary memoryless vector quantization. Several open problems suggested by the simulation results are presented.
The principle of minimum cross-entropy (minimum directed divergence, minimum discrimination information, minimum relative entropy) is summarized, discussed, and applied to the classical problem of estimating power spe...
详细信息
The principle of minimum cross-entropy (minimum directed divergence, minimum discrimination information, minimum relative entropy) is summarized, discussed, and applied to the classical problem of estimating power spectra given values of the autocorrelation function. This new method differs from previous methods in its explicit inclusion of a prior estimate of the power spectrum, and it reduces to maximum entropy spectral analysis as a special case. The prior estimate can be viewed as a means of shaping the spectral estimator. Cross-entropy minimization yields a family of shaped spectral estimators consistent with known autocorrelations. Results are derived in two equivalent ways: once by minimizing the cross-entropy of underlying probability densities, and once by arguments concerning the cross-entropy between the input and output of linear filters. Several example minimum cross-entropy spectra are included.
linear prediction is a generally accepted method for obtaining all-pole speech representations. However, in many situations (e.g., nasalization studies) spectral zeros are important and a more general modeling procedu...
详细信息
linear prediction is a generally accepted method for obtaining all-pole speech representations. However, in many situations (e.g., nasalization studies) spectral zeros are important and a more general modeling procedure is required. Unfortunately, the need for pitch synchronization has limited the success of available techniques. This paper explores a novel approach to pole-zero analysis, called homomorphic prediction, which seems to avoid the synchronization problem. A minimum-phase estimate of the vocal-tract impluse response is obtained by homomorphic filtering of the speech waveform. Such a signal, by definition, has a known time registration. linear prediction is applied to this waveform to identify its poles. The LPC "residual" (error signal) is computed by inverse filtering. This signal contains the information about the zeros. Its z transform is then approximated by a polynomial either through a weighted least squares procedure (homomorphic prediction, using Shanks' method of finding zeros), or by spectral inversion followed by a second pass of LPC (homomorphic prediction involving "inverse LPC"). Results of a preliminary evaluation on real and synthetic speech are presented.
This paper describes the acoustic processing in a syntactically guided natural language speech understanding system. A major characteristic of this system is that through the interaction of pragmatic, semantic, and sy...
详细信息
This paper describes the acoustic processing in a syntactically guided natural language speech understanding system. A major characteristic of this system is that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream. The purpose of the acoustic processor is to verify or reject these hypotheses. This verification is done in two stages. First, digital filtering is done to classify each 10-ms segment as one of ten primitive classes. If the proposed word is consistent with the pattern of primitive classes at the corresponding point in the acoustic stream, further analysis is done using linear predictive coding and other digital filters. The results of this analysis are used to segment the acoustic signal and to further classify the voiced segments. Because this segmentation and classification can be tailored for each word, difficult analysis problems caused by coarticulation between adjacent sounds can be successfully solved. When combined with a sophisticated linguistic processor, these acoustical processing methods yield correct understanding of natural language utterances.
暂无评论