An LPC cepstrum processor for speech recognition is implemented on CMOS gate array. The processor that we designed contains a 24 bit floating-point MAC unit, which computes a correlation rapidly, the majority of opera...
详细信息
An LPC cepstrum processor for speech recognition is implemented on CMOS gate array. The processor that we designed contains a 24 bit floating-point MAC unit, which computes a correlation rapidly, the majority of operations in the algorithm. This processor has 22 register files to store temporary variables, which enable one to reduce access to external memory. For the purpose of fast operations, the floating-point MAC consists of a pipeline structure with 3 stages and uses a branched postnormalization scheme proposed in this paper. Experimental results show that it takes approximately 266 /spl mu/s to process a frame of 20 ms at 15 MHz clock rate. This processor runs at the maximum rate of 16.6 MHz and the number of transistors is 55,520.
We implement a robust speaker-independent speech recognition algorithm on an ATSP2181 fixed-point digital signal processor (DSP). The recognizer is robust against environmental noise, such as air-conditioner noise, co...
详细信息
We implement a robust speaker-independent speech recognition algorithm on an ATSP2181 fixed-point digital signal processor (DSP). The recognizer is robust against environmental noise, such as air-conditioner noise, computer-fan noise, and keyboard-typing noise, etc. We address several implementation issues to ensure that errors due to truncation, which are inevitable in a fixed-point implementation, are within an acceptable level. Preliminary experiments we conduct show that our DSP-based speech recognizer attains at least 95% recognition accuracy in a noisy environment.
linear prediction (LP) analysis is widely used in speech recognition for representing the short time spectral envelope information of speech. The predictive residues are usually ignored in LP analysis based speech rec...
详细信息
linear prediction (LP) analysis is widely used in speech recognition for representing the short time spectral envelope information of speech. The predictive residues are usually ignored in LP analysis based speech recognition system. In this study, the normalized residual error based on LP is introduced and the performance of the recognizer has been further improved by the addition of this new feature along with its first and second order derivative parameters. The convergence property of the training procedure based on the minimum classification error (MCE) approach is investigated, and experimental results on the city name recognition task demonstrated a 8% string error rate reduction by using the extended feature set as compared to conventional feature set.
This paper presents a novel and efficient variable bit rate LPC quantization approach. The proposed MCVQ framework allows a dynamic programming based minimum quantization distortion partitioning and quantization proce...
详细信息
This paper presents a novel and efficient variable bit rate LPC quantization approach. The proposed MCVQ framework allows a dynamic programming based minimum quantization distortion partitioning and quantization process to be performed on input LSP vector tracks in time. Variable duration segments of LSP vector tracks are classified into one of a finite number of language related events. Specific codebooks, designed optimally for each event type, are then employed to vector quantize the individual LSP vectors of a given segment. "high quality" LSP quantization can be easily achieved at an average of 700 bits/sec while "transparent" performance is obtained at an average rate of 800 bits/sec.
Voice conversion has recently emerged as an interesting branch of speech processing that deals with the modification of a speaker's perceived identity. This technology has applications in speech recognition, the e...
详细信息
Voice conversion has recently emerged as an interesting branch of speech processing that deals with the modification of a speaker's perceived identity. This technology has applications in speech recognition, the entertainment and security industries. This paper provides a brief introduction to current voice conversion approaches, and discusses the development of the PASS system, a parametric voice conversion algorithm based on static speaker characteristics. The system is easy to implement, requires no phonetic transcription of the speech data, and is shown to be valuable in the case where very little training data is available. Particular mention is made of the pitch extraction subsystem, which uses a novel pitch determination algorithm to ensure the robust estimation of pitch statistics.
This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linearpredictive techniq...
详细信息
This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linearpredictive technique is used. The ability of the system has been tested in experiments using stimuli uttered by 6 native Algerian speakers. The identification results have been confronted to those obtained by the SARPH knowledge based system. Our interest goes to the particularities of Arabic such as geminate and emphatic consonants and the duration. The results show that SNN achieved well in pure identification while in the case of phonologic duration the knowledge-based system performs better.
A new method for two-band approximation of excitation signals in an LPC model, to improve speech naturalness in very low rate coding, is proposed. Based on a simplified model of multi-band excitation, the method accur...
详细信息
A new method for two-band approximation of excitation signals in an LPC model, to improve speech naturalness in very low rate coding, is proposed. Based on a simplified model of multi-band excitation, the method accurately determines the degree of periodicity, using the concept of instantaneous frequency (IF) estimation in the frequency domain. The harmonic structure in the spectrum of LPC residual, within individual bands, is identified based on flatness of the IF as a criterion for pitch and voicing detection. On this basis, the excitation is modelled by combining a predefined periodic signal in the lower band and a random signal in the higher band. It is shown that this improves considerably the naturalness of reconstructed speech in very low rate coding in comparison with that obtained using traditional binary excitation. The performance of the technique is also given in temporal decomposition (TD) based coding at 800 b/s.
In our work to design a spoken language translator between South African languages we found the project inflexible and much bigger than first anticipated. The first part of the project was to design a system that will...
详细信息
In our work to design a spoken language translator between South African languages we found the project inflexible and much bigger than first anticipated. The first part of the project was to design a system that will perform language identification. This paper investigates a language identification (LID) task using three South African languages. The problem is investigated solely from the signal processing perspective using linear predictive coding (LPC)-based and parameterized discrete Fourier transform (DFT)-based feature-sets. Six minute speech data was collected from a talker in all three languages. The LID system uses five second samples of speech in the three languages to perform identification. The results show that a DFT-based parameterized feature-set significantly lowered the error rate. The lowest error rate is obtained at a spectral compression lower than the mel-scale.
In analysis-by-synthesis linear predictive coding (AbS-LPC) an LPC synthesis filter is combined with an analysis-by-synthesis search of the excitation signal. The synthesis filter is an estimator for the speech signal...
详细信息
In analysis-by-synthesis linear predictive coding (AbS-LPC) an LPC synthesis filter is combined with an analysis-by-synthesis search of the excitation signal. The synthesis filter is an estimator for the speech signal given the excitation. However, in most AbS-LPC algorithms this estimator has no explicit model of the quantization noise, which is present in the excitation signal. This paper describes quantization noise modeling in a vector AbS-LPC algorithm. Methods based on recursive Bayesian filtering and Kalman filtering are considered. Simulations indicate improved signal-to-noise ratios due to quantization noise modeling.
A new model for the spectral samples obtained in the multiband excitation speech coder (MBE) is introduced. Objective and subjective tests show that it compares favorably with the classical linear prediction (LP) mode...
详细信息
A new model for the spectral samples obtained in the multiband excitation speech coder (MBE) is introduced. Objective and subjective tests show that it compares favorably with the classical linear prediction (LP) model, specially for high pitched speakers. Strategies for efficiently quantizing the model parameters, suitable for low bit rate implementations of the MBE coder, are also addressed.
暂无评论