We present a method to improve the recognition performance under stressful talking conditions. This method is based on preprocessing the speech signals produced under stressful talking conditions when the optimum valu...
详细信息
We present a method to improve the recognition performance under stressful talking conditions. This method is based on preprocessing the speech signals produced under stressful talking conditions when the optimum value of number of poles of the vocal tract transfer function has been determined. Our results show that the optimum value of number of poles under stressful talking conditions is equal to 14. This optimum value improves the recognition rate to a good extent.
Line spectrum frequencies (LSFs) as one kind of important linear predictive coding parameters have been widely used in speech coding and speech recognition systems. This paper describes an efficient method to simplify...
详细信息
Line spectrum frequencies (LSFs) as one kind of important linear predictive coding parameters have been widely used in speech coding and speech recognition systems. This paper describes an efficient method to simplify the procedure for computing LSFs further. This method computes the intervals containing one root of P(x) and one root of Q(x) by successive bisections and interpolations, and then calculates LSFs also by successive bisections and interpolations. The simulation results show that this method can reduce the computation greatly.
In order to meet the needs of wireless digital communication, a new 1.2 kb/s speech coding algorithm based on mixed excitation linear prediction (MELP) is proposed in this paper. Compared with the US Federal Standard ...
详细信息
In order to meet the needs of wireless digital communication, a new 1.2 kb/s speech coding algorithm based on mixed excitation linear prediction (MELP) is proposed in this paper. Compared with the US Federal Standard 2.4 kb/s MELP algorithm, this new algorithm improves several aspects such as model structure, LSF (line spectrum frequency) quantization and decoding. This algorithm has been implemented on Texas Instruments TMS320LC548 DSP processor. Informal listening test results show that the intelligibility, naturalness and anti-noise performance of the 1.2 kb/s speech coding algorithm are comparable to those of the 2.4 kb/s MELP algorithm.
We develop a new speech processing program which identifies the pitch period and extracts the formant frequencies of Arabic speech. Its purpose is to enhance voice recognition and to simplify complex processing. The d...
详细信息
ISBN:
(纸本)0780365429
We develop a new speech processing program which identifies the pitch period and extracts the formant frequencies of Arabic speech. Its purpose is to enhance voice recognition and to simplify complex processing. The database is composed of Arabic sentences phonetically balanced, pronounced by several speakers (male, female, children). After acquisition, conversion and segmentation, we identify the voiced/unvoiced (V/UV) speech by analysing the its zero-crossing evolution. Then we compute the fundamental frequency and the first four formants of the raw speech. The results are compared with other PDA such as SIFT, AMDF, CEPSTRAL, and LPC by computing the detection error and the V/UV decision error.
The continuous trend towards smaller geometries implies the analysis of both short- and narrow-channel effects. Although the narrow channels are of high interest in low-power/low-voltage applications, relatively few a...
详细信息
The continuous trend towards smaller geometries implies the analysis of both short- and narrow-channel effects. Although the narrow channels are of high interest in low-power/low-voltage applications, relatively few and rather contrasting results have, in the past, been reported (Kuo et al, 1995; Fung et al, 1998; Wang et al, 2000; Cristoloveanu and Li, 1995; Zhao and Ioannou, 1999). The narrow-channel effects depend on the isolation technology (MESA, LOCOS, STI), wafer origin (SIMOX, Unibond, etc.), device architecture (fullyor partially depleted MOSFETs) and film thickness. In this paper, we attempt to elucidate the narrow-channel effects in fully depleted, LOCOS isolated n-MOSFETs as well as their relationship with other key dimensional effects (short channels and ultra-thin films).
We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated ...
详细信息
We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.
This paper presents the issues associated with the real-time implementation of MELP 2.4 kbps speech codec by using a TI fixed-point DSP. It briefly reviews the MELP algorithm and the procedure used in porting the fixe...
详细信息
This paper presents the issues associated with the real-time implementation of MELP 2.4 kbps speech codec by using a TI fixed-point DSP. It briefly reviews the MELP algorithm and the procedure used in porting the fixed-point MELP C codes into the TMS320C54x assembly codes. Various factors such as memory, speed and compatibility are also discussed.
This paper presents a new LPC parameter quantization method-SubBand Synthesized LPC Vector Quantization (SBS-LPC-VQ). In the subband synthesis process, the relationships between subband spectra and whole band LPC spec...
详细信息
This paper presents a new LPC parameter quantization method-SubBand Synthesized LPC Vector Quantization (SBS-LPC-VQ). In the subband synthesis process, the relationships between subband spectra and whole band LPC spectrum are established and thus the vector quantized subband LPC parameters are mapped to the whole band LPC parameters. This new SBS-LPC-VQ method overcomes high complexity problem of vector quantization of LPC parameters and isolates the distortion within each subband during the VQ process. It also provides the flexibility of assigning the bits to be used for each subband, choosing the order for LPC filter and determining the number of bands for the subband classification. The Critical band Weighting Spectral Distortion Measure (CW-SDM), which is a perceptually motivated objective measure by using a critical band weighting function, is used for measuring the distortion of quantized LPC spectrum. Using this kind of distortion measure, the experiments show that the SBS-LPC-VQ has obtained 24 bits/frame for coding whole 16th order LPC parameters with about 1 dB average spectral distortion. For comparison, the results by conventional spectral distortion measure (SDM) are also presented in the paper.
Within the context of automatic speech recognition (ASR) applications for telephony, we investigate the acoustic preprocessing issues that are at stake in going from the fixed line to the cellular network. Because the...
详细信息
Within the context of automatic speech recognition (ASR) applications for telephony, we investigate the acoustic preprocessing issues that are at stake in going from the fixed line to the cellular network. Because the spectral representation used in enhanced full rate GSM is linear prediction, we investigate the relative advantages and drawbacks of conventional mel-frequency cepstral coefficient (MFCC) parameters derived from a non-parametric fast Fourier transform (FFT) and MFCC parameters derived from a linear predictive coding (LPC) spectral estimate. Robust formant parameters, also derived from an LPC description of the spectrum, are studied as an alternative to MFCCs. Within the framework of connected digit recognition based on hidden Markov models, ASR performance was measured for clean conditions, as well as for three different additive noise conditions. In addition, the performance of a conventional recognition procedure was compared with the performance of an ASR system based on our acoustic backing-off implementation of missing feature theory (MFT).
暂无评论