The paper presents a high quality harmonic excitation linearpredictive (HE-LPC) speech coder operating at 2 kb/s based on a harmonic excitation model with two bands. The system incorporates novel features such as: co...
详细信息
ISBN:
(纸本)0780386787
The paper presents a high quality harmonic excitation linearpredictive (HE-LPC) speech coder operating at 2 kb/s based on a harmonic excitation model with two bands. The system incorporates novel features such as: combined pitch detection; residual harmonic matching voicing determination; extraction and interpolation of residual harmonic magnitudes. Subjective listening tests indicate that this coder has the same quality as that of the Federal Standard MELP (mixed excitation linear prediction) coder at 2.4 kb/s, whether the training database is from Chinese or English.
This paper presents a set of descriptors for On-line signature writer identification. These descriptors are intended to be used in e-business and e-government to detect signature forgery where it is hard to identify t...
详细信息
The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not acc...
详细信息
The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not accurately reflect human perception of spectral discontinuity in concatenated speech. Previous studies on spectral join costs have focused predominantly on static spectral measures extracted from the unit boundary. In this paper spectral dynamic behaviour is investigated as a source of discontinuity in concatenated speech. A number of measures representing spectral dynamics are tested for the task of detecting discontinuities. The spectral dynamic measures tested contain information correlating with human perception of discontinuities, suggesting that spectral dynamics are a source of discontinuity in concatenated speech. A strategy to effectively combine dynamic and static measures is proposed using principal component analysis (PCA).
In this paper, a vector quantization-block constrained trellis coded quantization (VQ-BCTCQ) is presented to quantize line spectrum frequency (LSF) parameters of the wideband speech codec. Both the predictive structur...
详细信息
In this paper, a vector quantization-block constrained trellis coded quantization (VQ-BCTCQ) is presented to quantize line spectrum frequency (LSF) parameters of the wideband speech codec. Both the predictive structure and safety-net concept are combined into VQ-BCTCQ to develop the predictive VQ-BCTCQ. The performance of this quantization is compared with that of the linear predictive coding (LPC) vector quantizer used in the AMR-WB codec, and reductions in spectral distortion (SD) and encoding complexity are demonstrated.
Multichannel audio refers to a widespread technology that enables audio rendering through multiple channels. Audio reproduction with multiple channels has the advantage of recreating the acoustic scene with unpreceden...
详细信息
Multichannel audio refers to a widespread technology that enables audio rendering through multiple channels. Audio reproduction with multiple channels has the advantage of recreating the acoustic scene with unprecedented fidelity and of immersing the listener in an acoustic environment that is virtually indistinguishable from reality. However, one of the greatest challenges of multichannel audio is its high storage and transmission requirements especially since accurate rendering through as many possible channels is the main purpose. Audio resynthesis addresses this issue by enabling us to recreate a set of channels at the receiver end by transmitting only one source channel. We propose a new, enhanced, approach on multichannel audio resynthesis which involves a novel residual processing technique and a features alignment method that significantly increase the resynthesis accuracy. Our results show that this latest method leads to higher audio quality and allows for the robust treatment of any type of multichannel signal set.
We propose a new weighting function which is computationally simple and an approximation to the theoretically derived optimum weighting function shown in the literature. The proposed weighting function is perceptually...
详细信息
We propose a new weighting function which is computationally simple and an approximation to the theoretically derived optimum weighting function shown in the literature. The proposed weighting function is perceptually motivated and provides improved vector quantization performance compared to several weighting functions proposed so far, for line spectrum frequency (LSF) parameter quantization of both clean and noisy speech data.
This paper concentrates on the abstraction of parameters from vocal tract transfer function of Chinese whispered vowels. As there is no fundamental frequency in whispered speech, these parameters become more prominent...
详细信息
This paper concentrates on the abstraction of parameters from vocal tract transfer function of Chinese whispered vowels. As there is no fundamental frequency in whispered speech, these parameters become more prominent in speech analysis and synthesis. It is proved that the proposed algorithm for formant estimation is effectual and the gain of vocal tract transfer function can be utilized for tune analysis. The comparison of these parameters between Chinese whispered vowels and voiced ones is the basis for whispering recognition and conversion. The ratios of formants excursion, bandwidths movement, gain and energy variation are calculated for scalar weight coefficients of voice personality transformation.
This paper proposed an ASIC of LPC-cepstrum (LPCC) for speech recognition. The proposed ASIC of LPCC can reduce the calculation load of processor in the speech recognition system. In addition, the resource sharing met...
详细信息
This paper proposed an ASIC of LPC-cepstrum (LPCC) for speech recognition. The proposed ASIC of LPCC can reduce the calculation load of processor in the speech recognition system. In addition, the resource sharing method is adopted into our design in order to reduce the chip size. Hence, it does not give an emphasis on sophistication but on high- performance and low-cost solution. Finally, we did some experiments to compare with other DSP or ASIC design. We found that our proposed LPCC ASIC can efficiently reduce the computation load.
The linear prediction coefficients obtained from noisy speech have an important impact on improving the quality of the enhanced speech in the speech enhancement algorithm based on the Kalman Smoother. According to the...
详细信息
The linear prediction coefficients obtained from noisy speech have an important impact on improving the quality of the enhanced speech in the speech enhancement algorithm based on the Kalman Smoother. According to the properties of the slow changes of the vocal tract, this paper proposes a novel Kalman smoothing algorithm for speech enhancement based on vocal tract parameters smoother. Firstly, the linear prediction coefficients are converted into the line spectrum frequency parameters. Then, these parameters of the adjacent frames are smoothed before they transform into state transition matrix. Experimental results indicate that the proposed Kalman smoothing algorithm for speech enhancement based on vocal tract parameters smoother can suppress the sudden changes of residual noise energy and improve the quality of enhanced speech. The quality of the enhanced speech is evaluated by means of segmental SNR and ITU-PESQ scores. Experimental results indicate that the proposed algorithm achieves obvious improvements compared with conventional Wiener filter.
暂无评论