Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction...
详细信息
ISBN:
(纸本)0780382927
Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction algorithm for speech recognition. However, the usual cepstral dynamics do not directly reflect the dynamics of the speech spectrum, as they are extracted from cepstral parameters. In this paper we show that dynamic parameters obtained directly from the speech spectrum can perform better under low-SNR noisy speech conditions, in comparison to the conventional dynamic cepstral parameters. Results on a compact set of the Aurora task have been reported.
Speech feature extraction is one of the most important stage in the speech recognition process. In this paper, we propose a new neural networks architecture called the cooperative modular neural predictivecoding (CMN...
详细信息
ISBN:
(纸本)0780381777
Speech feature extraction is one of the most important stage in the speech recognition process. In this paper, we propose a new neural networks architecture called the cooperative modular neural predictivecoding (CMNPC). It is based on the interaction of discriminant experts DFE-NPC (discriminant feature extraction) optimized for macro-classification by the help of a criterion: the modelisation error ratio (MER). We propose a theoretical validation of this model by linking The MER with a likelihood ratio. The performances of this architecture are estimated in a phoneme recognition task. The phonemes are extracted from the Darpa-Timit speech database. Comparisons with coding methods (LPC, MFCC, PLP) are presented. They put in obviousness an improvement of the recognition rates.
Feature extraction from speech representation is one of the processes in speech recognition. Parametric modeling. is a dominant approach to model speech signals. Within a localized interval, speech representation is e...
详细信息
ISBN:
(纸本)0780381149
Feature extraction from speech representation is one of the processes in speech recognition. Parametric modeling. is a dominant approach to model speech signals. Within a localized interval, speech representation is equivalent to a noise driven output from an all-pole system that can be estimated using linear prediction. Besides the characteristics of speech, temporal variability of speech signal model is also due to the computation of linear prediction coefficients. Thus, an alternative representation is proposed based on the Gabor coefficients. In this paper, a comparison is made with the linear prediction coefficients to show the consistency of the parameters that are generated for implementation in the speech recognition system.
The Digital Waveguide Mesh is a technique used in the modelling of room acoustics and musical instruments. This paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of...
详细信息
ISBN:
(纸本)0780378504
The Digital Waveguide Mesh is a technique used in the modelling of room acoustics and musical instruments. This paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of human-like vowel sounds. A 2D software mesh model is created that approximates the shape of the vocal tract in different vowel positions, and a glottal flow input is applied. The resulting signal bears similar resonant frequencies or formants to that of recorded speech. Recommendations are made towards extending the model to include some of the more complex features of the mouth, potentially constructing an acoustical model of the human vocal tract capable of creating speech sounds of increased naturalness.
Typically, room equalization techniques do not focus on designing filters that equalize the room responses at perceptually relevant frequencies. Thus, by performing Bark warping of the room responses and using lower o...
详细信息
ISBN:
(纸本)0780378504
Typically, room equalization techniques do not focus on designing filters that equalize the room responses at perceptually relevant frequencies. Thus, by performing Bark warping of the room responses and using lower order spectral models it is possible to design low order psycho-acoustically motivated equalization filters. In this paper, we compare the performance, through experiments, between the traditional RMS averaging filter (with and without warping to the Bark scale) and our pattern recognition based multiple listener equalization filter with warping [2]. It is shown that the our pattern recognition filter, using warping, outperforms the RMS averaging filter (with and without warping to the Bark scale).
This paper describes a new method for extraction of Click Evoked Otoacoustic Emissions (CEOAE), where the stimulus artifact is eliminated by the use of linear predictive coding (LPC). In this method, the prediction co...
详细信息
ISBN:
(纸本)0780377893
This paper describes a new method for extraction of Click Evoked Otoacoustic Emissions (CEOAE), where the stimulus artifact is eliminated by the use of linear predictive coding (LPC). In this method, the prediction coefficients are computed over the first samples of the click response, which is mainly formed by passive oscillations, and the unpredicted part of the remaining response is taken as the CEOAE signal. Preliminary tests were made with fifteen signals collected from normal hearing adults presenting stimulus artifacts in their responses. Results show the advantage of eliminating most of the stimulus artifact, while preserving a better signal-to-noise ratio than the standard nonlinear stimulus cancellation method.
A high performance speech processing integrated circuit (SPIC) based on linear predictive coding (LPC) techniques is presented. Both system and technological aspects of the SPCI design are covered in detail. The SPIC ...
详细信息
A high performance speech processing integrated circuit (SPIC) based on linear predictive coding (LPC) techniques is presented. Both system and technological aspects of the SPCI design are covered in detail. The SPIC synthesizer chip will normally be used in a three-chip minimum system configuration including the synthesizer, a microcomputer, and an external vocabulary ROM. The speech quality can be tailored to the user's requirements by varying the bit rate between the vocabulary ROM and the microcomputer from 1.1 to 8.5 kbit/s. Among the specific features of the SPIC are pitch synchronous synthesis, speech parameters interpolation capability, silence, and power-down mode. Moreover, the digital filter output is interpolated at a high sampling rate (32 kHz) to avoid the necessity for off-chip filtering. An 8-bit PCM output (A law) and a 16-bit linear-coded output are provided. The SPIC can be delivered in two different bonding configurations either for small system application (three-chip system) or for larger system configuration.
Voice processing has made considerable progress in the last 10 years. Interaction with computer systems using spoken language is becoming common in consumer products, office systems and telecommunications applications...
详细信息
Voice processing has made considerable progress in the last 10 years. Interaction with computer systems using spoken language is becoming common in consumer products, office systems and telecommunications applications. The article focuses on speech technology for computer systems. We briefly review voice technology, its current status, and specific aspects of automatic speech recognition, speech synthesis and applications.
The objective of this work is to investigate whether joint optimization of short-term and long-term predictors manifests significant advantages over the sequential optimization in speech coding. We propose a new joint...
详细信息
The objective of this work is to investigate whether joint optimization of short-term and long-term predictors manifests significant advantages over the sequential optimization in speech coding. We propose a new joint optimization method based on Wiener filtering. The proposed analysis model resolves the pitch-bias problem of classical LPC analysis by considering the contribution of the long-term predictor while optimizing the short-term predictor. Our approach to joint optimization is based on analysis-by-synthesis and guarantees the synthesis filter stability. By applying our proposed joint optimization approach to CELP coding we obtain superior objective and subjective performance relative to CELP coding with sequential optimization. To provide voice quality equivalent to that of sequentially optimized CELP, the jointly optimized coder needs fewer FCB pulses and requires a reduced bit budget for LPC quantization. Our listening tests suggest that the JCELP coder at 4.25 kbps is equivalent in quality to the G.729 at 8 kbps.
A generic approach for estimating the bit-error-rate (BER) of a single and multi-branch coherent reception for an antipodal binary multi-band transmitted voice signals through an independent-fading channels are descri...
详细信息
A generic approach for estimating the bit-error-rate (BER) of a single and multi-branch coherent reception for an antipodal binary multi-band transmitted voice signals through an independent-fading channels are described. Recent voice coders are the candidate coders for producing good quality voice in the range from 2.4 to 16 kb/s. The issues of transmitting different bit rates of recent trends over a generalized-fading mobile radio channels are discussed. The performance results demonstrate the severe penalty in signal-to-noise ratio (SNR) that must be paid as a consequence of the fading characteristics of the received signal. The BER results are obtained without using channel coding or error-correcting codes.
暂无评论