We propose a model for the generation of speech signals based on the stochastic properties of the speech signal. It is shown that the speech signal is the multiplication of a Gaussian random process (RP) by a slowly t...
详细信息
We propose a model for the generation of speech signals based on the stochastic properties of the speech signal. It is shown that the speech signal is the multiplication of a Gaussian random process (RP) by a slowly time-varying Rayleigh RP. This assumption is justified since it results in a spherically invariant random process (SIRP) with a Gaussian distribution in short intervals and a Laplacian distribution for long intervals. This result is justified by studying the probability distribution function (PDF) of the estimated power spectrum density (PSD) of the speech signal using linear predictive coding (LPC) for several segmentation lengths. Our experiments show that the PDF of the estimated PSD is well approximated by a Rayleigh distribution around the formant frequencies and by a Gaussian distribution in frequencies far from the formant frequencies.
In this paper, we present a Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. The efficiency gaine...
详细信息
In this paper, we present a Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. The efficiency gained from jointly coding multiple frames permits the use of the mean squared error distortion (MSE) criterion rather than the computationally expensive spectral distortion. The proposed coder encompasses improvements in both distortion performance and complexity with transparency achieved at 23 bits per frame when coding two frames jointly or 21 bits per frame when coding 3 frames.
We want to perform the attenuation correction in the case of 3D attenuated ray transform with a parallel geometry. We suppose that the attenuation function is available but not registered with the data. We use the sum...
详细信息
We want to perform the attenuation correction in the case of 3D attenuated ray transform with a parallel geometry. We suppose that the attenuation function is available but not registered with the data. We use the sum on each slice of the 2D data consistency conditions of the attenuated Radon transform to register the attenuation function with the data. We then correct for the attenuation using the Novikov formula. We show numerical experiments indicating the feasibility of the approach and propose a scheme including the diffusion correction for the registration of CT to SPECT for SPECT imaging improvement.
In this work, based on the MP-CELP speech coding with HPDR technique, fine granularity scalability (FGS) is introduced by adjusting the amount of transmitted fixed excitation information. The FGS feature aim at changi...
详细信息
In this work, based on the MP-CELP speech coding with HPDR technique, fine granularity scalability (FGS) is introduced by adjusting the amount of transmitted fixed excitation information. The FGS feature aim at changing the bit rate of the conventional coding more finely and more smoothly. Through performance analysis and computer simulation, the quality of scalability of the MP-CELP coding is presented with an improvement from conventional scalable MP-CELP. The HPDR technique is also applied to the MP-CELP to use for tonal language, meanwhile it can support the core coding rate of 4.2, 5.5, 7.5 kbps and additional scaled bit rates.
There has been tremendous progress in fractal compression since the pioneer work of Barnsley and Jacquin in the late 1980s. As the encoding time complexity issues are gradually being solved, there is a steady growth o...
详细信息
There has been tremendous progress in fractal compression since the pioneer work of Barnsley and Jacquin in the late 1980s. As the encoding time complexity issues are gradually being solved, there is a steady growth of applications of fractals, especially in hybrid systems. However, such fractal hybrid systems tend to be rather difficult to analyze, and part of that difficulty lies in the quantization of the scaling and luminance offset parameters adopted in most fractal compression schemes. In this paper, we present theoretical and empirical justification for a well-known but underused alternative parametrization for the fractal affine transform. In particular, we shall present a detailed analysis of a hybrid fractal-LPC (linear predictive coding) compression scheme using the aforementioned alternative affine transform parameters. (C) 2003 Elsevier Science B.V. All rights reserved.
This paper proposes a novel approach, bandwidth-adjusted linear predictive coding (BLPC) analysis, for robust speech recognition. We estimate and adjust the dispersion of formant bandwidths according to the maximum li...
详细信息
This paper proposes a novel approach, bandwidth-adjusted linear predictive coding (BLPC) analysis, for robust speech recognition. We estimate and adjust the dispersion of formant bandwidths according to the maximum likelihood criteria. Our preliminary results show that the proposed BLPC can provide better performance than the traditional linear predictive coding does in noisy environments. (C) 2002 Elsevier Science B.V. All rights reserved.
The inconsistencies inherent in packet switched network delivery can be seriously detrimental to the quality of a real-time speech transmission. This paper places its emphasis on the importance of the short term predi...
详细信息
ISBN:
(纸本)0780382927
The inconsistencies inherent in packet switched network delivery can be seriously detrimental to the quality of a real-time speech transmission. This paper places its emphasis on the importance of the short term prediction (STP) filter parameters as these are perceptually important to intelligible speech. We introduces several novel schemes for the recovery of lost STP parameters represented as line spectral frequencies (LSFs) based on extrapolation and interpolation techniques. The unique inclusion of a number of past and/or future frames further commends this work. Methods which out-perform traditional frame repetition and linear interpolation in terms of accuracy are presented and evaluated.
We have already proposed the ELS-based time-varying complex AR (TV-CAR) speech analysis based on forward LP as well as forward and backward LP in which the equation error is modeled by an AR model to whiten the error....
详细信息
ISBN:
(纸本)0780382927
We have already proposed the ELS-based time-varying complex AR (TV-CAR) speech analysis based on forward LP as well as forward and backward LP in which the equation error is modeled by an AR model to whiten the error. The methods are based on an equation error method and can estimate unbiased speech spectrum due to the whitened equation error. It can be considered that these speech analysis methods may be suitable for a front-end of robust speech recognition and packet loss concealment on VoIP. This paper presents output error based ELS TV-CAR speech analysis algorithm and compares the performance with the equation error based method.
We present an architecture called the modular neural predictivecoding architecture (Modular NPC). The Modular NPC is used for discriminative feature extraction (DFE). It provides an architecture based on phonetics kn...
详细信息
We present an architecture called the modular neural predictivecoding architecture (Modular NPC). The Modular NPC is used for discriminative feature extraction (DFE). It provides an architecture based on phonetics knowledge applied to phoneme recognition. The phonemes are extracted from the Darpa-Timit speech database. Comparisons with coding methods (LPC, MFCC, PLP) are presented: they put in obviousness an improvement of the recognition rates.
In this paper is presented our work concerning continuous speech recognition in a telephone numbers voice-dialing task realized by statistical modeling. The speech is parameterized using the computational inexpensive ...
详细信息
ISBN:
(纸本)0780379799
In this paper is presented our work concerning continuous speech recognition in a telephone numbers voice-dialing task realized by statistical modeling. The speech is parameterized using the computational inexpensive linear predictive coding (LPC) to determine the LPC, the cepstral LPC and the reflection coefficients. In our tests, the recognizer based on Hidden Markov Models (HMMs) performs better for the cepstral LPC coefficients then for the reflection coefficients or the LPC coefficients.
暂无评论