A new reconstruction method for frame erasures in speech transmission is presented which is based on parameterization of the speech signal by means of linear prediction (LPC) and voicing analysis. The problem of gener...
详细信息
A new reconstruction method for frame erasures in speech transmission is presented which is based on parameterization of the speech signal by means of linear prediction (LPC) and voicing analysis. The problem of generating partially voiced substitute speech signals is solved by performing separate voicing decisions in sub-bands. The method yields considerable improvements compared with silence substitution for frame erasure ratios of up to 10% or even 20%. The combination of the reconstruction method with adaptive speech coders showed virtually the same good results for forward adaptation, whereas a higher degradation is caused by backward-adaptive coders.
The authors propose an efficient vector quantization scheme and a novel linear predictive coding (LPC) analysis scheme, both of which exploit interframe correlation in the successive spectrum envelope of speech signal...
详细信息
The authors propose an efficient vector quantization scheme and a novel linear predictive coding (LPC) analysis scheme, both of which exploit interframe correlation in the successive spectrum envelope of speech signals. The first quantization scheme proposed is a multistage vector quantization of line spectrum pair (LSP) parameters with a partially adaptive codebook (MSVQ-AC). The second algorithm is an LPC analysis scheme, with closed-loop adaptive prefiltering (LPC-PF), which realized temporary higher-order analysis than the standard LPC with a few additional transmission bits. A combined system of the LPC-PF and two-split, two-stage VQ with the adaptive codebook can quantize tenth-order LSP parameters at around 23 bits/frame, realizing sufficient quality and reasonable complexity.< >
Various speech enhancement schemes are analyzed in terms of the conflicting real time requirements of computational delay, robustness and accuracy. A spectral subtraction scheme is found to be implementable in real ti...
详细信息
Various speech enhancement schemes are analyzed in terms of the conflicting real time requirements of computational delay, robustness and accuracy. A spectral subtraction scheme is found to be implementable in real time using the available digital processing board. The tasks such as the computation of the spectral contents of the speech plus noise and the noise, smoothing the estimates, and speech enhancement filtering are implemented in the frequency domain using FFT in view of the computational speed and robustness. A variant of a spectral subtraction scheme is implemented in real time on a DSP board and its performance is evaluated.
Efficient quantization of linear predictive coding (LPC) filter coefficients play an essential role in very-low-bit-rate speech coding systems. This paper examines a new suboptimal matrix quantization scheme for LPC p...
详细信息
ISBN:
(纸本)0780367200
Efficient quantization of linear predictive coding (LPC) filter coefficients play an essential role in very-low-bit-rate speech coding systems. This paper examines a new suboptimal matrix quantization scheme for LPC parameters, called multi-stage matrix quantization (MSMQ), which operates at bit rates between 400 and 800 bit/s. With the new matrix quantization method, using a 22.5 ms LPC analysis frame, spectral distortion about 1 dB is achieved at 800 bit/s. In the proposed coder, line spectral frequency (LSF) parameters of multiple consecutive frames are grouped into a superframe and jointly quantized. The new residual LSF vector quantization scheme gives a bit rate reduction in MSMQ without any additional complexity or storage. The new MSMQ leads into several schemes of various computational complexity/storage characteristics.
It is difficult to produce natural-sounding speech using LPC voice coding techniques (vocoder). The main problem is the simplistic model of the excitation source in terms of the pitch pulses and white noise used in th...
详细信息
It is difficult to produce natural-sounding speech using LPC voice coding techniques (vocoder). The main problem is the simplistic model of the excitation source in terms of the pitch pulses and white noise used in the synthesis. The LPC vocoder requires robust and accurate pitch determination, which is a problem that still has not been completely solved. Recently, a multi-pulse approach for modeling the excitation source has been proposed by Atal. However, this approach requires a substantial increase in the bit rate over conventional LPC. We describe in this paper a novel pulse source model for improving the quality of LPC speech. The binary pulse source excitation is obtained from the speech signal using a nonlinear filtering procedure. Our approach is able to maintain good speech quality even at bit rates of 4.8 kbps or lower.
Communication devices which perform distributed speech recognition (DSR) tasks currently transmit standardized coded parameters of speech signals. Recognition features are extracted from signals reconstructed using th...
详细信息
Communication devices which perform distributed speech recognition (DSR) tasks currently transmit standardized coded parameters of speech signals. Recognition features are extracted from signals reconstructed using these on a remote server. Since reconstruction losses degrade recognition performance, proposals are being considered to standardize DSR-codecs which derive recognition features, to be transmitted and used directly for recognition. However, such a codec must be embedded on the transmitting device, along with its current standard codec. Performing recognition using codec bitstreams avoids these complications: no additional feature-extraction mechanism is required on the device, and there are no reconstruction losses on the server. We propose an LDA-based method for extracting optimal feature sets from codec bitstreams and demonstrate that features so derived result in improved recognition performance for the LPC, GSM and CELP codecs. For GSM and CELP, we show that the performance is comparable to that with uncoded speech and standard DSR-codec features.
The main goal of automatic speech recognition (ASR) is to produce a machine which will recognize accurately normal human speech from any speaker. The recognition system may be classified as speaker-dependent or speake...
详细信息
ISBN:
(纸本)9775031680
The main goal of automatic speech recognition (ASR) is to produce a machine which will recognize accurately normal human speech from any speaker. The recognition system may be classified as speaker-dependent or speaker-independent and isolated-word or connected word. There are three approaches to research in automatic speech recognition (ASR); the acoustic-phonetic approach, the pattern recognition approach, and the database statistical approach. Two approaches of this kind: hidden Markov model (HMM) and artificial neural network (ANN) are presented in this paper.
This paper reports the results of an investigation of a computable Quality Comparison Measure (called the QCM) for linearpredictive systems. The measure described is easily obtained by a synthesis-analysis procedure....
详细信息
This paper reports the results of an investigation of a computable Quality Comparison Measure (called the QCM) for linearpredictive systems. The measure described is easily obtained by a synthesis-analysis procedure. It is a weighted combination of differences between the input and output speech parameters for a series of spoken sentences. Results are presented that demonstrate a high correlation between QCM and listener preference scores. The QCM offers an alternative to costly and time consuming formal listening procedures.
For a code excited linear prediction (CELP)-based coder, coarse quantization usually results in the degradation of speech quality. Spectrum and gain were found to have the least sensitivity to quantization errors and ...
详细信息
For a code excited linear prediction (CELP)-based coder, coarse quantization usually results in the degradation of speech quality. Spectrum and gain were found to have the least sensitivity to quantization errors and were efficiently quantized. The frame period and number of subframes per frame were also appropriately selected for the purpose of bit-rate reduction. The performance of four possible excitations, namely, stochastic excitation, binary pulse excitation, self-excitation and repeat excitation were investigated. Using signal-to-noise ratio as a quality measure, the coder at 3076 b/s was found to have a performance very close to that of the federal standard at 4800 b/s.< >
A novel a audio/speech coding algorithm, hybrid audio coding (HAC) is described. New features of the algorithm include window switching with generalized MDCT, an improved quantization scheme of the MDCT coefficients, ...
详细信息
A novel a audio/speech coding algorithm, hybrid audio coding (HAC) is described. New features of the algorithm include window switching with generalized MDCT, an improved quantization scheme of the MDCT coefficients, and waveform normalization in the time domain. HAC provides a good quality at a bit rate of 8 to 16 kbps, and it is also proven that the developed algorithm is effective for both audio and speech signals.
暂无评论