Within the context of automatic speech recognition (ASR) applications for telephony, we investigate the acoustic preprocessing issues that are at stake in going from the fixed line to the cellular network. Because the...
详细信息
Within the context of automatic speech recognition (ASR) applications for telephony, we investigate the acoustic preprocessing issues that are at stake in going from the fixed line to the cellular network. Because the spectral representation used in enhanced full rate GSM is linear prediction, we investigate the relative advantages and drawbacks of conventional mel-frequency cepstral coefficient (MFCC) parameters derived from a non-parametric fast Fourier transform (FFT) and MFCC parameters derived from a linear predictive coding (LPC) spectral estimate. Robust formant parameters, also derived from an LPC description of the spectrum, are studied as an alternative to MFCCs. Within the framework of connected digit recognition based on hidden Markov models, ASR performance was measured for clean conditions, as well as for three different additive noise conditions. In addition, the performance of a conventional recognition procedure was compared with the performance of an ASR system based on our acoustic backing-off implementation of missing feature theory (MFT).
An important step toward achieving a high-quality 4 kb/s speech codec is reducing the coding-rate of the stochastic codebook component to near 2 kb/s. The increased reconstruction error in the residual that such low-r...
详细信息
An important step toward achieving a high-quality 4 kb/s speech codec is reducing the coding-rate of the stochastic codebook component to near 2 kb/s. The increased reconstruction error in the residual that such low-rate quantization implies motivates the search for techniques that reduce the perceptibility of the errors in the reconstructed signal. Pitch-synchronous estimation of the linear-prediction filter and pitch-synchronous updating of the adaptive codebook reduce the coefficient-estimation error and increase the relative contribution of the adaptive codebook component to the synthesized signal, thereby reducing audible noise. However, pitch synchronous analysis normally results in a variable-rate coder. To obtain a fixed-rate representation, we introduce an efficient representation of the stochastic codebook component using a pulse density of one pulse per 2 ms and signed magnitudes specified by 2 bits per pulse-pair. The resulting reconstructions are evaluated for CELP coders corresponding to classical and generalized-pitch-predictor designs. In both cases speech quality comparable to 8 kb/s G.729 is achieved.
Concatenative synthesis can produce high-quality speech but is limited to the allophonic variations and voice types that were captured in the database. It would be desirable to modify speech units to remove formant di...
详细信息
Concatenative synthesis can produce high-quality speech but is limited to the allophonic variations and voice types that were captured in the database. It would be desirable to modify speech units to remove formant discontinuities and to create new speaking styles, such as hypo- or hyper-articulated speech. Unfortunately, manipulating the spectral structure often leads to degraded speech quality. We investigate two speech modification strategies, one based on inverse filtering and the other on sinusoidal modeling, and we explain their merits and shortcomings for changing the spectral envelope in speech. We then propose a method which uses sinusoidal modeling and represents the complex sinusoidal amplitudes by an all-pole model. The all-pole model approximates the sinusoidal spectrum well, both in the amplitude and in the phase domain. We use the sinusoidal+all-pole model to control the spectral envelope in recorded speech. High-quality modified speech is generated from the model using sinusoidal synthesis. A perceptual test was conducted, which shows that the model was effective at changing vowel identities and was preferable over residual excited LPC.
This paper describes a new algorithm, based on the compression of the linearly predicted residuals of the wavelet coefficients, for electrocardiogram (EGG) compression. The main goal of the algorithm is to reduce the ...
详细信息
ISBN:
(纸本)0780365429
This paper describes a new algorithm, based on the compression of the linearly predicted residuals of the wavelet coefficients, for electrocardiogram (EGG) compression. The main goal of the algorithm is to reduce the bit rate while keeping the reconstructed signal distortion at a clinically acceptable level. The input signal is divided into blocks and each block goes through a discrete wavelet transform; then the resulting wavelet coefficients are linearly predicted. In this way, a set of uncorrelated transform domain signals is obtained. These signals are compressed using modified run-length and Huffman coding techniques. The error corresponding to the difference between the wavelet coefficients and the predicted coefficients is minimized in order to get the best predictor. The method is assessed through the use of percent residual difference (PRD) and visual inspection measures. By this compression method small PRD with high compression ratio and low implementation complexity are achieved.
We propose a recursive coding scheme for spectral parameters. The coder utilizes a Gaussian mixture model describing the PDF of pairs of source vectors. From this model we can, at each coding instant, calculate a cond...
详细信息
We propose a recursive coding scheme for spectral parameters. The coder utilizes a Gaussian mixture model describing the PDF of pairs of source vectors. From this model we can, at each coding instant, calculate a conditional PDF for the vector to be coded given the previously quantized vector. Knowing this conditional PDF, we can use high rate theory to design an optimal codebook. To reduce complexity we employ a classified vector quantizer structure. The mixture components with highest probability around the incoming vector are detected and the corresponding codebooks are searched. A pair of indices, representing the chosen component and codevector, is transmitted. Experiments show a small degradation in performance due to the imposed structural constraint. When compared with a standard predictive vector quantization (PVQ), slightly superior performance, both objectively and subjectively, is observed. Further, close to transparent spectral coding is achieved at 16 bits per frame when employed in a sinusoidal vocoder.
This paper presents a scalable audio format, called "multi-layer scalable LPC audio format", that addresses similar functionalities of MPEG-4. The format offers different levels of data rate and audio qualit...
详细信息
This paper presents a scalable audio format, called "multi-layer scalable LPC audio format", that addresses similar functionalities of MPEG-4. The format offers different levels of data rate and audio quality, and answers to the most important requirements of transmission and storage purposes, such as channel error robustness, cell loss robustness, low delay, and playback control. It operates in four modes. The first mode is based on a modified version of the LD-CELP algorithm, in which each 6 samples are represented by one single byte. In order to improve the signal-to-noise ratio (SNR), additional enhancement layers are embedded in the bit stream to allow higher quality at higher bit rates. The resultant bit rates are integer-multiple of 10.67 kbps. The other three modes use QMF splitting to two, four and eight subbands. These modes allow efficient representation of wideband audio and speech signals, and offer extension layers of 5.33 and 2.66 kbps. A simple and efficient header structure is embedded in the bitstream to allow the decoding process even in channel error conditions and even when the bitstream has been down-scaled somewhere during the transmission but has not been acknowledged to the decoder. Comparison results are conducted with respect to MPEG and ITU standards.
Summary form only given. The article gives an overview of text-to-speech (TTS) technology and a description of some issues of potential interest to speech coding experts. After motivation for the use of TTS technology...
详细信息
Summary form only given. The article gives an overview of text-to-speech (TTS) technology and a description of some issues of potential interest to speech coding experts. After motivation for the use of TTS technology, it describes the general architecture of a text-to-speech system with particular emphasis on the speech synthesis component. Both formant synthesis and concatenative synthesis are presented, offering different degrees of flexibility and quality. Several well-known speech coding techniques (including LPC vocoders, waveform interpolation, harmonic coding, and layered coding) have been used in speech synthesis. It explains how they have been applied, and the advantages and limitations of those techniques when used in speech synthesis. The main goal is to increase cooperation between the speech coding community and the TTS community, and in particular to motivate the need for speech coding algorithms that meet the requirements of the next generation speech synthesis technology.
In MELP coder, LPC coefficients are quantized to 25 bit/frame by MSVQ. The overall bit of MELP is 54 bit/frame, so more than 45% of the required bandwidth is spent on transmission of LPC coefficients. In this paper, b...
详细信息
In MELP coder, LPC coefficients are quantized to 25 bit/frame by MSVQ. The overall bit of MELP is 54 bit/frame, so more than 45% of the required bandwidth is spent on transmission of LPC coefficients. In this paper, based on LPC adaptive forward-backward quantization, a novel variable-rate MELP coder is proposed, in which linear prediction is done by using either the currently (forward LPC) or previously decoded (backward LPC) speech frame. The backward LPC scheme shall be applied, i.e., the LPC coefficients based on the previously decoded optimal speech frame are used to encode the current frame and only the time delay shall be transmitted to the decoder, so, average LPC bit number becomes smaller. Computer simulation shows significant average overall bit rate reduction is achieved without compromising the decoded speech quality.
This paper presents a novel method for improving the low frequency performance of a microphone array by use of a twin codebook consisting of critical band energy and LPC distribution of speech. A modified Wiener filte...
详细信息
This paper presents a novel method for improving the low frequency performance of a microphone array by use of a twin codebook consisting of critical band energy and LPC distribution of speech. A modified Wiener filter created using these codebooks eliminates residual noise at low frequencies, which may result from the limitation of the array size or the beamforming algorithm. The preliminary simulation tests have shown that the proposed system is effective, and may provide good performance for those microphone array systems which are installed on a computer in an office, or in a moving vehicle.
This paper presents an efficient low-delay CELP speech coder based on a structure given by Chen et el. (1992). The proposed coder can operate at a rate of 8 Kb/s and has an arithmetic complexity that is 20% lower than...
详细信息
ISBN:
(纸本)0780365429
This paper presents an efficient low-delay CELP speech coder based on a structure given by Chen et el. (1992). The proposed coder can operate at a rate of 8 Kb/s and has an arithmetic complexity that is 20% lower than that of the CELP of Chen et al. with an acceptable increase in the delay. The proposed coder has been tested to provide a good-quality speech.
暂无评论