Statistical methods for parameter estimation are often based on a Gaussian assumption. In this paper we describe a signal model based on an assumption of uniform distributions. This signal model leads to a simple, non...
详细信息
Statistical methods for parameter estimation are often based on a Gaussian assumption. In this paper we describe a signal model based on an assumption of uniform distributions. This signal model leads to a simple, non-linear algorithm, called QD, for parameter estimation. The QD algorithm can be applied to adaptive, inverse filtering of speech signals. The technique used is to sequentially estimate the reflection coefficients of a lattice structure. Some of the results of one such application are discussed and compared to those obtained with other methods.
A composite source is an indexed family of random processes (subsources) together with a switch which chooses from among these processes in a stochastic fashion. Such a source has often been proposed as a model for sp...
详细信息
A composite source is an indexed family of random processes (subsources) together with a switch which chooses from among these processes in a stochastic fashion. Such a source has often been proposed as a model for speech and other processes having piece-wise, or quasi, stationary behavior. Until recently, however, very little has been known about such models from either a theoretical or a practical perspective. In this paper, we consider a speaker/isolated word recognition system derived from a composite source model for speech production. In particular, estimates of the underlying subsources are obtained using a modified data compression algorithm. Switch sequences are then derived from these estimates for each utterance. Finally, switch sequences are compared in the time domain (using Levenshtein's metric) and from a statistical point of view (via variation distance). Both modes of comparison are seen to be highly correlated and produce a recognition procedure with very encouraging results.
A high quality speech synthesizer system which consists of 3 LSI chips, a speech synthesizer, a 128k bit ROM and a general purpose microprocessor has been developed. This system is based on the recently developed Part...
详细信息
A high quality speech synthesizer system which consists of 3 LSI chips, a speech synthesizer, a 128k bit ROM and a general purpose microprocessor has been developed. This system is based on the recently developed Partial Autocorrelation (PARCOR) voice compression technique. This system can generate high quality speech from a data rate of less than 2400 bits per second. Several new techniques are applied for this system to improve the quality of generated speech especially of the female voice. This system has many advantageous features such as speech speed control and external pitch excitation.
With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed re...
详细信息
With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). This paper presents a new approach called vector quantization. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with 15 to 20 fewer bits/frame than that required for the optimized scalar quantizing approaches presently in use. The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies. This paper introduces the theory in a nonrigorous form, along with practical results to date and an extensive list of research topics for this new area of speech coding.
An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discu...
详细信息
An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in linearpredictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.
The performance of the Burg method for speech analysis is compared to the autocorrelation and covariance methods. The criterion of goodness is the accuracy of the spectral approximation, filter stability, windowing re...
详细信息
The performance of the Burg method for speech analysis is compared to the autocorrelation and covariance methods. The criterion of goodness is the accuracy of the spectral approximation, filter stability, windowing requirements, data frame length, and spectral resolution. A mathematical comparison is presented for the simple first-order signal. Spectral comparisons are presented for a second-order speech-like signal. Real speech synthesis using the analysis results of the autocorrelation and Burg methods are subjectively compared. The results do not find any justification for preferring the computationally more complex Burg method.
The standard first-order preemphasis approach used in linear prediction analysis results in an undesirable low-frequency boost in the synthesis spectrum. A solution is obtained by mismatching the preemphasis and poste...
详细信息
The standard first-order preemphasis approach used in linear prediction analysis results in an undesirable low-frequency boost in the synthesis spectrum. A solution is obtained by mismatching the preemphasis and postemphasis factors or by using a particular second-order preemphasis filter.
Kalman backward adaptive predictor coefficient identification is combined with a modified pitch-compensating quantizer (MPCQ) to produce a high-performance adaptive differential pulse code modulation (ADPCM) system fo...
详细信息
Kalman backward adaptive predictor coefficient identification is combined with a modified pitch-compensating quantizer (MPCQ) to produce a high-performance adaptive differential pulse code modulation (ADPCM) system for operation at data rates of 12-16 kbits/s. The Kalman/MPCQ system is compared to an ADPCM system using a Kalman algorithm and robust Jayant qnantization and to a system with a fixed-tap predictor and MPCQ. The performance indicators are signal-to-quantization noise ratio (SNR), sound spectrogram analyses, and formal subjective listening tests. The SNR comparisons indicate that the Kalman/ MPCQ system has the highest SNR, followed by the fixed-tap/MPCQ system, and then the Kalman/robust Jayant system. Subjective listening test results show that the Kalman/MPCQ system is preferred over the fixed-tap/MPCQ system 100 percent of the time and over the Kalman/ robust Jayant system 80 percent of the time. Kalman adaptation thus provides an important perceptual effect not evident in the SNR's. The previously catastrophic effects of transmission errors on backward adaptive prediction are eliminated by simple ADPCM system modifications that do not affect the SNR or subjective quality of the output in the absence of errors for the five sentences studied. The problem of tandeming with a linearpredictive coder (LPC) is investigated by using LPC processed speech as input to the three ADPCM systems and by using the output of the three ADPCM systems as input to an LPC analysis algorithm. For the LPC to ADPCM connection, the two systems with the MPCQ produce good quality output speech, while the system with robust Jayant quantization exhibits a fading phenomenon. For the ADPCM into LPC analysis, all three systems produce speech of approximately the same quality, with the fixedtap system being slightly, noisier. Using a distance measure proposed by Itakura, the predictor coefficients computed from the three ADPCM system outputs are compared with the predictor coefficien
The log likelihood measure has been widely used in speech research for comparing speech signals. Recently, it has been proposed as a measure for assessing the quality of coded speech. In this paper we present an inter...
详细信息
The log likelihood measure has been widely used in speech research for comparing speech signals. Recently, it has been proposed as a measure for assessing the quality of coded speech. In this paper we present an interpretation of the log likelihood ratio measure within the theoretical framework of a waveform coder distortion model. We then discuss the implications of this interpretation and show how it can be applied to the formulation of better objective measures of waveform coder performance.
The purpose of this work was to study, experimentally, two windowless LPC analysis algorithms for use in speech digitization. The two algorithms are a circular autocorrelation technique which utilizes the pseudoperiod...
详细信息
The purpose of this work was to study, experimentally, two windowless LPC analysis algorithms for use in speech digitization. The two algorithms are a circular autocorrelation technique which utilizes the pseudoperiodic nature of voiced speed, and a reflection coefficient estimation technique suggestion by J. P. Burg. Both techniques showed considerable promise in the experimental results.
暂无评论