This paper considers the problem of comparing two sets of (LPC) coefficients or, more generally, that of comparing two short segments of speech via LPC techniques. It is shown that Itakura's prediction-residual ra...
详细信息
This paper considers the problem of comparing two sets of (LPC) coefficients or, more generally, that of comparing two short segments of speech via LPC techniques. It is shown that Itakura's prediction-residual ratio is intuitively unsatisfactory and theoretically misleading as a distance measure. Two slower, but more accurate statistical means of comparison are suggested, and these are supported by evidence from a simulation study.
Speech compression, enhancement and recognition in noisy, reverberant conditions is a challenging task. In this paper a new approach to this problem, which is developed in the framework of probabilistic random modelin...
详细信息
ISBN:
(纸本)9783642274428
Speech compression, enhancement and recognition in noisy, reverberant conditions is a challenging task. In this paper a new approach to this problem, which is developed in the framework of probabilistic random modeling, speech coding techniques are commonly used in low bit rate analysis and synthesis. coding algorithms seek to minimize the bit rate in the digital representation of a signal without an objectionable loss of signal quality in the process. Speech enhancement aims to improve speech quality by using various algorithms This paper deals with multistage vector quantization technique used for coding of narrow band speech signals. The parameter used for coding of speech signals are the line spectral frequencies, so as to ensure filter stability after quantization. A new approach incorporates the information about statistical random nature of uncompressed speech signal using LBG algorithm. The code books used for quantization are generated by using Linde, Buzo and Gray(LBG) algorithm. Speech model is characterized by LPC coefficients and parameterized by the coefficients of the reverberation filters The results of the multistage vector quantizer are compared with unconstrained vector quantization Technique. The performance of quantization is measured in terms of spectral distortion measured in dB, Computational complexity measured in KFlops and Memory Requirements measured in Floats. From the results it can be proved that multistage vector quantization is having better spectral distortion performance, less computational complexity and memory requirements when compared to unconstrained vector quantization. The proposed approach yields significantly estimating the parameters from the data, better performance in both signal to noise ratio and subjective filter methods
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft's Whistler system, which is based on concatenative synthesis. Both...
详细信息
ISBN:
(纸本)0780344286
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft's Whistler system, which is based on concatenative synthesis. Both methods are based on a source-filter model, one of them using LPC parameters and the other one using cepstral parameters. The proposed methods achieve high quality prosody modification, retain the characteristics of the donor speaker, allow for spectral manipulation (to reduce spectral discontinuities at unit boundaries), yield compact acoustic inventories and improved voiced fricatives.
Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the cod...
详细信息
ISBN:
(纸本)9781509066315
Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.
This paper uses a method of incorporating simultaneous masking into the calculation of a linearpredictive filter (SMLPC) as the front end to a 2kbps waveform interpolation (WI) speech coder. A modification to the mas...
详细信息
ISBN:
(纸本)0780364163
This paper uses a method of incorporating simultaneous masking into the calculation of a linearpredictive filter (SMLPC) as the front end to a 2kbps waveform interpolation (WI) speech coder. A modification to the masking threshold calculation used in SMLPC is proposed. This modification improves the performance of SMLPC in noise like sections by placing greater emphasis on strongly voiced speech. MOS test results reveal that the modified SMLPC improved the perceptual quality of the WI coder. The improvement is significant for female speakers whilst the quality for male speech is virtually unchanged. This result conflicts with previous results reported for SMLPC where only male speech was improved. The change is attributed to the modification of the masking threshold and confirms that adapting the masking threshold according to the pitch of the speech will allow SMLPC to remove more perceptually important information from all input speech than standard LPC.
The inconsistencies inherent in packet switched network delivery can be seriously detrimental to the quality of a real-time speech transmission. This paper places its emphasis on the importance of the short term predi...
详细信息
ISBN:
(纸本)0780382927
The inconsistencies inherent in packet switched network delivery can be seriously detrimental to the quality of a real-time speech transmission. This paper places its emphasis on the importance of the short term prediction (STP) filter parameters as these are perceptually important to intelligible speech. We introduces several novel schemes for the recovery of lost STP parameters represented as line spectral frequencies (LSFs) based on extrapolation and interpolation techniques. The unique inclusion of a number of past and/or future frames further commends this work. Methods which out-perform traditional frame repetition and linear interpolation in terms of accuracy are presented and evaluated.
We have already proposed the ELS-based time-varying complex AR (TV-CAR) speech analysis based on forward LP as well as forward and backward LP in which the equation error is modeled by an AR model to whiten the error....
详细信息
ISBN:
(纸本)0780382927
We have already proposed the ELS-based time-varying complex AR (TV-CAR) speech analysis based on forward LP as well as forward and backward LP in which the equation error is modeled by an AR model to whiten the error. The methods are based on an equation error method and can estimate unbiased speech spectrum due to the whitened equation error. It can be considered that these speech analysis methods may be suitable for a front-end of robust speech recognition and packet loss concealment on VoIP. This paper presents output error based ELS TV-CAR speech analysis algorithm and compares the performance with the equation error based method.
Line Spectrum Pairs (LSP) representation of linear predictive coding coefficients is widely used in speech coding, speech recognition and other domains due to its desirable interpolation and quantization properties. S...
详细信息
ISBN:
(纸本)9780780397361
Line Spectrum Pairs (LSP) representation of linear predictive coding coefficients is widely used in speech coding, speech recognition and other domains due to its desirable interpolation and quantization properties. Several methods proposed for calculating LSP parameters have been complicated by high computation complexity. This paper proposed an effective and efficient algorithm APF using Aitken iterative method and polynomial synthesis division. LSP parameters were estimated by obtaining a root of N-order nonlinear equation by Aitken iterative method at first, then decreasing degrees with polynomial, synthesis division, and finally calculating quartic equation using Ferrari solution. Theoretic analysis and experiment results show that the proposed algorithm has not only high precision but also low calculation complexity.
Modifications to IP based packet network protocols are examined that would make the network tolerant of bit errors in packet payloads or headers. These modifications are tested with communication quality MELP voice tr...
详细信息
ISBN:
(纸本)9781424414833
Modifications to IP based packet network protocols are examined that would make the network tolerant of bit errors in packet payloads or headers. These modifications are tested with communication quality MELP voice traffic. As measured by a PESQ score, improvements in the perceptual quality of the speech are noted that are maximized when error checking is disabled for the entire packet.
A novel excitation model called the multicategory vector excitation (MCVE) model for a linear predictive coding (LPC) vocoder at 2.4 kb/s is proposed. In this model, speech signal is classified into four categories: u...
详细信息
ISBN:
(纸本)0780305329
A novel excitation model called the multicategory vector excitation (MCVE) model for a linear predictive coding (LPC) vocoder at 2.4 kb/s is proposed. In this model, speech signal is classified into four categories: unvoiced, voiced, onset, and offset. For every category of speech, an excitation codebook is available. Different excitation codebooks hold different characteristics. The analysis-by-synthesis procedure is used to select the excitation vectors. The computer simulation has been carried out, and the results show that the vocoder with the new excitation model is capable of synthesizing more intelligible and more natural speech at 2.4 kb/s.
暂无评论