Abstract-This paper introduces a reading learning aid which is designed to be a low cost consumer product. The design of the machine required the marriage of the recent advancements in both speech and optical technolo...
详细信息
Abstract-This paper introduces a reading learning aid which is designed to be a low cost consumer product. The design of the machine required the marriage of the recent advancements in both speech and optical technologies. An inexpensive optical wand is used to read bar coded allophone strings from the printed book. Natural sounding speech is then constructed by an 8-bit microcomputer using an LPC synthesizer. The system is self- contained with the flexibility of producing songs, sound effects, and speech. All the speech related data are embodied in the printed material along with games and activities to make the reading learning process fun and rewarding.
Very short response time is a critical requirement for automatic discrete utterance recognition. The real-time vocabulary size of most of today's commercially available recognizers is limited to several hundreds o...
详细信息
Very short response time is a critical requirement for automatic discrete utterance recognition. The real-time vocabulary size of most of today's commercially available recognizers is limited to several hundreds of utterances, primarily due to the fact that detailed acoustic matching involves considerable computation. The method presented here offers an economical solution to the real-time large-vocabulary recognition problem by carrying out recognition in two stages. In the initial stage, the incoming utterance is linearly matched against the entire vocabulary using only two features-utterance duration and either two or three average spectra for each utterance. While the number of prototypes matched is large, the time required per match is substantially reduced. During this initial stage, a preset number of best-match prototypes is determined for each unknown input. In the second stage, matching is performed for the best-match list based upon more detailed features (e.g., 10-ms log-power spectra), using more elaborate matching methodology, e.g., dynamic programming. Evaluation experiments were conducted using the 2000 most frequent words in an office-correspondence corpus and three normal adult-male talkers. It was observed that first-stage best-match lists of 30-50 items included the "correct" words between 99.0 and 99.5 percent of the time. Using DP on 10-ms spectral samples for the second stage, recognition accuracy ranged from 86.5 to 94.5 percent. A match-limiter, when used with a 50-64-word, commercially available recognizer for the second stage, makes near-real-time large-vocabulary recognition feasible.
This paper presents a tutorial review of lattice structures and their use for adaptive prediction of time series. Lattice filters associated with stationary covariance sequences and their properties are discussed. The...
详细信息
This paper presents a tutorial review of lattice structures and their use for adaptive prediction of time series. Lattice filters associated with stationary covariance sequences and their properties are discussed. The least squares prediction problem is defined for the given data case, and it is shown that many of the currently used lattice methods are actually approximations to the stationary least squares solution. The recently developed class of adaptive least squares lattice algorithms are described in detail, both in their unnormalized and normalized forms. The performance of the adaptive least squares lattice algorithm is compared to that of some gradient adaptive methods. Lattice forms for ARMA processes, for joint process estimation, and for the sliding-window covariance case are presented. The use of lattice structures for efficient factorization of covariance matrices and solution of Toeplitz sets of equations is briefly discussed.
The distortion performance of the vector quantization approach for LPC voice coding is examined both analytically and experimentally. Analytically, interpretations of the interparameter coupling effects of a distortio...
详细信息
The distortion performance of the vector quantization approach for LPC voice coding is examined both analytically and experimentally. Analytically, interpretations of the interparameter coupling effects of a distortion measure and the clustering nature of the algorithm for LPC vector quantization are obtained to show its relationship with the residual minimization process in LPC analysis. Experimentally, a large database of speech is used to compare its performance and properties to scalar quantization. The results lend further insight into the superior performance of vector quantization.
We show by theoretical argument and by experiment with both synthetic and real data that selection of an undriven segment of voiced speech for analysis by linear predictive coding (LPC) gives more accurate estimates o...
详细信息
We show by theoretical argument and by experiment with both synthetic and real data that selection of an undriven segment of voiced speech for analysis by linear predictive coding (LPC) gives more accurate estimates of the poles of the vocal-tract model. In the case of voiced nasal phonemes, this technique provides a simple algorithm for separately determining the poles and the zeros in the model and illustrates the desirability of identifying the portions of the speech wave during which there is a significant driving input. A key problem which remains is the development of a practical algorithm for selecting such segments for analysis.
An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discu...
详细信息
An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in linearpredictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.
In this paper we present an efficient procedure to obtain a rational model for a 2-D linear shift-invariant, discrete system using first- and second-order data from it. This procedure is a modification of the nonlinea...
详细信息
In this paper we present an efficient procedure to obtain a rational model for a 2-D linear shift-invariant, discrete system using first- and second-order data from it. This procedure is a modification of the nonlinear least-squares approximation, and it generalizes the Padé approximants and the spectral estimation modeling procedures. The parameters of the approximating filter are obtained by solving a system of linear equations by means of an efficient recursive algorithm which is developed using the relation of the approximation problem with the theory of orthogonal polynomials on the unit bidisk. We discuss some of the algebraic properties of the solution and apply them to define cases for which the BIBO stability of the approximating filters is ensured. The proposed procedure finds applications in the design and stabilization of 2-D recursive digital filters and in the autoregressive moving average (ARMA) modeling of stationary random fields.
An 800 bit/s vector quantization linear predictive coding (LPC) vocoder has been developed. The recently developed LPC vector quantization theory is applied to reduce the bit rate for LPC coefficients coding by a fact...
详细信息
An 800 bit/s vector quantization linear predictive coding (LPC) vocoder has been developed. The recently developed LPC vector quantization theory is applied to reduce the bit rate for LPC coefficients coding by a factor of four. Branch search techniques and separation of voiced and unvoiced codebooks are applied for better algorithm efficiency. Differential coding is applied to reduce the bit rate for the pitch and gain parameters by one third. Formal subjective evaluation shows that the 800 bit/s vocoder preserves most of the intelligibility of an LPC system. It is also robust under different transmission error and acoustic conditions. Informal listening comparisons show the quality to be acceptable and sometimes very close to 2400 bit/s LPC speech. The computational cost of the 800 bit/s vocoder is equivalent to or even lower than the 2400 bit/s LPC-10. Compatibility with any LPC-10 vocoder is guaranteed because the 800 bit/s design only differs in the quantization and encoding algorithms. Further bit rate reduction can be achieved by removing frame to frame redundancy in the code.
A single CMOS speech synthesis LSI, organized as a special purpose microcomputer containing program ROM, RAM, 32K of speech data ROM, and a D/A converter is described in this paper. The chip utilizes new speech synthe...
详细信息
A single CMOS speech synthesis LSI, organized as a special purpose microcomputer containing program ROM, RAM, 32K of speech data ROM, and a D/A converter is described in this paper. The chip utilizes new speech synthesis techniques to generate high quality speech, reproducing the natural inflection and intonation of the speaker, and has been used to produce speech at a bit rate of about 3 kbits/s.
This paper describes a 2400 bit/s vocoder based on spectral envelope estimation, spectral coding to 48 bits, pitch extraction, and decreasing-chirp excitation for voiced synthesis. Several spectral smoothing and codin...
详细信息
This paper describes a 2400 bit/s vocoder based on spectral envelope estimation, spectral coding to 48 bits, pitch extraction, and decreasing-chirp excitation for voiced synthesis. Several spectral smoothing and coding schemes are described and intelligibility test results compared. This vocoder was implemented on the CSP-30 high speed digital processor at the RADC/EEV Speech Processing Research and Development Facility at Hanscom AFB, MA. This system yields high performance in a quiet environment and is robust in acoustic noise environments at a data rate of 2400 bits/s.
暂无评论