In this paper, we introduce an auto-regressive moving average (ARMA) lattice model for speech modeling. The speech characteristics are modeled and expressed in the form of lattice reflection coefficients for classific...
详细信息
In this paper, we introduce an auto-regressive moving average (ARMA) lattice model for speech modeling. The speech characteristics are modeled and expressed in the form of lattice reflection coefficients for classification. Self Organization Map (SOM) is used to build codebooks for classification and recognition of the lattice reflection coefficients. Experimental results based on an isolated word speech database of 10 words/names indicate that the ARMA lattice model achieves superior recognition performance as compared to those of the conventional auto-regressive (AR) model.
This paper describes a method for warping the frequency axis of cepstrum coefficients in a way analogous to the preprocessing performed by the human ear. The equations are derived and historical background relating to...
详细信息
This paper describes a method for warping the frequency axis of cepstrum coefficients in a way analogous to the preprocessing performed by the human ear. The equations are derived and historical background relating to different warping scales is discussed. The calculation is a two-step procedure in which the bilinear transform is used to represent the LPC coefficients on a warped frequency scale. A warping constant determines the degree of transformation. This results in an ARMA representation of the filter transfer function. The second step determines recursively the cepstrum coefficients corresponding to this ARMA transfer function.< >
In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of h...
详细信息
In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of harmonic coding, using sinusoids whose frequencies are multiples of the fundamental frequency, where the amplitudes of the sinusoids are adaptively modulated using gammatone filters as a perceptual weighting filter. The sinusoids' phases are also adjusted so as to maximize the perceptual quality. In order to reduce the total bit rate to 1.2 kbit/s, a new segment coder for spectral information (LSP coefficients) using DP matching is also proposed. The quality of the synthesized speech was improved by 0.45 in the mean opinion score (MOS) compared with that of the simple LPC vocoder operating at the same rate, and it was comparable to that of 2.4 kbit/s MELP coder.
An end-point detector for LPC speech using squared prediction error look-ahead and automatic/manual threshold determination is described. The detector is algorithmically simple, computationally efficient,and uses only...
详细信息
An end-point detector for LPC speech using squared prediction error look-ahead and automatic/manual threshold determination is described. The detector is algorithmically simple, computationally efficient,and uses only one decision parameter. Preliminary tests indicate that it is relatively immune to transient pulses and various low-level noises, yet preserves low-level speech sounds such as weak fricatives to a significant extent under moderate noise conditions. Tests indicate that 93.8% of automatically determined endpoints agree to within two frames of manually determined endpoints. The detector is especially suitable for use in vector-quantization based LPC systems, where the squared prediction error is easily available.
The partial trigonometric moment problem is shown to provide a unifying framework for several speech modelling techniques, such as the classical LPC antoregressive model, the line spectral pairs and composite sinusoid...
详细信息
The partial trigonometric moment problem is shown to provide a unifying framework for several speech modelling techniques, such as the classical LPC antoregressive model, the line spectral pairs and composite sinusoidal waves models, and the Toeplitz eigenvector model for formant extraction, From a mathematical viewpoint, this moment problem can be identified to an extension problem in the class of impedance functions or equivalently in the class of nonnegative definite Toeplitz matrices.
Multiple pulse excited linear predictive coding (MPLPC) has recently received a great deal of attention in the literature as an attractive means of speech coding at data rates below 10 Kbits/second. The existing appro...
详细信息
Multiple pulse excited linear predictive coding (MPLPC) has recently received a great deal of attention in the literature as an attractive means of speech coding at data rates below 10 Kbits/second. The existing approaches to MPLPC analysis arrive at the parameters for an all-pole model by minimizing the mean squared modeling error before attempting to find a set of pulses to excite the model. The strategy proposed here selects the all-pole parameters to concentrate the model excitation in a finite number of locations. The goal is then to produce a maximally pulse-like residual as a result of the all-pole parameter estimation.
An algorithm for LPC (linear predictive coding) parameter optimization in multipulse (MP)-LPC based speech coders is presented. It is shown that, by taking into account the nature of the MP-excitation signal into LPC ...
详细信息
An algorithm for LPC (linear predictive coding) parameter optimization in multipulse (MP)-LPC based speech coders is presented. It is shown that, by taking into account the nature of the MP-excitation signal into LPC parameter computation, it is possible to improve the effectiveness of the LPC model. This results in a better quality of the reconstructed signal in terms both of objective and subjective criteria. The implementation details of the algorithm are discussed and experimental results are presented. In particular a comparison with standard MP-LPC techniques is given.< >
a new method for coding generic audio signals at 64 kbit/s in the bandwidth 20-15000 Hz with a low delay is presented. It combines sub-band coding, low delay CELP algorithm and cascaded filterbanks. We show how the di...
详细信息
a new method for coding generic audio signals at 64 kbit/s in the bandwidth 20-15000 Hz with a low delay is presented. It combines sub-band coding, low delay CELP algorithm and cascaded filterbanks. We show how the different parameters of LD-CELP can be adapted to achieve a better quality, perceptual coding techniques are integrated into the encoder for allocating bits to each sub-band.
暂无评论