A robust linear predictive coding (LPC) method that can be used in noisy as well as quiet environment has been studied. In this method, noise autocorrelation coefficients are first obtained and updated during non-spee...
详细信息
A robust linear predictive coding (LPC) method that can be used in noisy as well as quiet environment has been studied. In this method, noise autocorrelation coefficients are first obtained and updated during non-speech periods. Then, the effect of additive noise in the input speech is removed by subtracting values of the noise autocorrelation coefficients from those of autocorrelation coefficients of corrupted speech in the course of computation of linear prediction coefficients. When signal-to-noise ratio of the input speech ranges from 0 to 10 dB, a performance improvement of about 5 dB can be gained by using this method. The proposed method is computationally very efficient and requires a small storage area.
A method is described that measures the range of frequency spectra produced when a person speaks. Our "minimax" method selects a certain size subset from a set of LPC vectors such that the maximum RMS log sp...
详细信息
A method is described that measures the range of frequency spectra produced when a person speaks. Our "minimax" method selects a certain size subset from a set of LPC vectors such that the maximum RMS log spectral distance from any vector in the superset to its nearest neighbor in the subset is minimized. The minimum inter-vector distance in the subset is, in general, monotonically related to the volume of the region that contains the superset. Passages of English, Japanese, and Spanish were recorded, and analyzed using this method. Results indicate that the range of spectra characteristic of different individual's speech are significantly different. There is less difference, if any, between languages in their measured acoustic ranges. The results are discussed with reference to issues in phonology and speech coding.
One of the major drawbacks of the standard pattern recognition approach to isolated word recognition is that poor performance is generally achieved for word vocabularies with acoustically similar words. This poor perf...
详细信息
One of the major drawbacks of the standard pattern recognition approach to isolated word recognition is that poor performance is generally achieved for word vocabularies with acoustically similar words. This poor performance is related to the pattern similarity (distance) algorithms that are generally used in which a global distance between the test pattern and each reference pattern is computed. Since acoustically similar words are, by definition, globally similar, it is difficult to reliably discriminate such words, and a high error rate is obtained. By modifying the pattern similarity algorithm so that the recognition decision is made in two passes, improvements in discriminability among similar words can be achieved. In particular, on the first pass the recognizer provides a set of global distance scores which are used to decide a class (or a set of possible classes) in which the spoken word is estimated to belong. On the second pass a locally weighted distance is used to provide optimal separation among words in the chosen class (or classes) and the recognition decision is made on the basis of these local distance scores. For a highly complex vocabulary (letters of the alphabet, digits, and 3 command words) recognition improvements of from 3 to 7 percent were obtained using the two-pass recognition strategy.
The main objective of this work has been to add the model for fricative excitation to the LPC synthesis model. From the LPC model one finds the acoustic tube section with the greatest constriction and adds a modulated...
详细信息
The main objective of this work has been to add the model for fricative excitation to the LPC synthesis model. From the LPC model one finds the acoustic tube section with the greatest constriction and adds a modulated noise signal. The results from this model demonstrate that one is able to produce noise bursts at the right time instants that are shorter than the frame length. This gives a more natural sound for certain phonemes, but adds a quantization type of background noise.
The application of integrated circuit technology to Speech Synthesis and Recognition represents an important development in the field. This paper describes a complete Speech Synthesis System on a single chip. The cons...
详细信息
The application of integrated circuit technology to Speech Synthesis and Recognition represents an important development in the field. This paper describes a complete Speech Synthesis System on a single chip. The considerations involved in the choice of a compatible algorithm, machine word length and coefficient accuracy are discussed. The device contains a 32 word vocabulary and an innovative implementation of the LPC lattice structure. It operates at a variable bit rate to provide high quality speech with low bit storage requirements. The software supporting the Speech Synthesis System is also described.
linear prediction has been widely used in speech coding. However, a large amount of calculation is necessary for parameter extraction by the autocorrelation method, so it is not suited for hardware implementation. Thi...
详细信息
linear prediction has been widely used in speech coding. However, a large amount of calculation is necessary for parameter extraction by the autocorrelation method, so it is not suited for hardware implementation. This paper proposes a new analysis method, involving iterative method in the lattice circuit for successive K parameter approximations. In the iterative method, the improved hybrid algorithm was found best, from the viewpoint of high speed convergence for speech. For pitch extraction, partly divided average magnitude difference function (AMDF) method was used. Using these two methods, a vocoder was developed which requires only a small amount of calculation and is suited for hardware implementation (e.g. LSI). Good speech quality was obtained.
The purpose of this research was to investigate the effectiveness of several clustering algorithms for separation of LPC speech segments. Four thousand frames of speech parameters were divided into thirty clusters usi...
详细信息
The purpose of this research was to investigate the effectiveness of several clustering algorithms for separation of LPC speech segments. Four thousand frames of speech parameters were divided into thirty clusters using four clustering algorithms and three initial seed point selection methods. A second-order and an eithth-order norm in the area function parametric domain were used for distance measures. For single-pass clustering algorithms, using every 130th frame as a seed point resulted in a substantially lower error than that given by using the first thirty frames as seed points. The benefit was noticable for using a more complex algorithm for generating the initial seed points. For iterative clustering algorithms, the initial allocation had negligible effect on the final error and on the number of iterations.
Due to its programmability, the Programmable Digital Signal Processor (PDSP)chip set lends itself to the implementation of a variety of speech synthesis functions. When programmed as an all-pole lattice filter used in...
详细信息
Due to its programmability, the Programmable Digital Signal Processor (PDSP)chip set lends itself to the implementation of a variety of speech synthesis functions. When programmed as an all-pole lattice filter used in LPC synthesis, the PDSP allows the user to exploit quality-enhancement techniques such as mixed sourcing, pitch- or time-synchronous updating, variable frame and filter length, interpolation, various quantizing schemes, and adjustable speaking rate. Since the quantizing algorithm is under user control, the encoded bit rate may range from 1100 to 28,000 bits per second according to speech quality requirements. The paper discusses these techniques and a commercially available LPC speech synthesis circuit board which illustrates typical PDSP application.
Prediction plays a key role in many signal processing applications. linear Prediction has, in particular, been extremely useful to the development of digital speech processing techniques and applications. There is how...
详细信息
Prediction plays a key role in many signal processing applications. linear Prediction has, in particular, been extremely useful to the development of digital speech processing techniques and applications. There is however a growing need for improved forms of prediction. We discuss, in this paper, a form of non-linear prediction, namely, the prediction of the phase of speech signals. This study is conducted within a short-time analysis/synthesis framework and is based upon a new treatment of the classical speech production model. Experimental data are presented confirming the theoretical results. Finally the use of phase prediction to low-bit rate, high-quality coding applications is discussed.
Vector Quantization is applied to modify a 2400 bps LPC vocoder to operate at 800 bps, while retaining acceptable intelligibility and naturalness of quality. The design of this speech compression system is discussed a...
详细信息
Vector Quantization is applied to modify a 2400 bps LPC vocoder to operate at 800 bps, while retaining acceptable intelligibility and naturalness of quality. The design of this speech compression system is discussed and compared to other very low bit rate vocoders. Advantages of vector quantization over a scalar technique are examined in detail, and several new properties are presented.
暂无评论