A novel scheme of generating the codebook for vector quantisation is presented. With the initial codebook resulting from a K-d tree splitting procedure based on the greatest coordinate variance, a proposed partial GLA...
详细信息
A novel scheme of generating the codebook for vector quantisation is presented. With the initial codebook resulting from a K-d tree splitting procedure based on the greatest coordinate variance, a proposed partial GLA is used to improve the codevectors. The performance of the VQ so obtained is superior to those of the VQ designed by the standard GLA with the same initialisation and the splitting-initialised LEG algorithm. However, the improvement in performance is accompanied by an increase in the computational complexity involved in the designed stage.
An efficient quantisation method of line spectrum pairs (LSP) which has good performance and very low complexity and memory is proposed. The ordering property of the LSP parameters is utilised in the DPCM scheme. The ...
详细信息
An efficient quantisation method of line spectrum pairs (LSP) which has good performance and very low complexity and memory is proposed. The ordering property of the LSP parameters is utilised in the DPCM scheme. The new scalar quantisation algorithm requires 32 bit/frame to achieve 1 dB(2) average spectral distortion. The quantisation performance has also been shown to be robust across databases and different speakers.
linear prediction is formulated in a vector space by means of the orthogonal transformation, with which the L(1) criterion can be easily incorporated to yield an efficient iterative algorithm. An improvement over the ...
详细信息
linear prediction is formulated in a vector space by means of the orthogonal transformation, with which the L(1) criterion can be easily incorporated to yield an efficient iterative algorithm. An improvement over the covariance method is verified in the experiments with a high-pitch synthetic vowel.
This correspondence proposes a new CELP coding method which embeds speech classification in adaptive codebook search. This approach can retain the synthesized speech quality at bit-rates below 4 kb/s. A pitch analyzer...
详细信息
This correspondence proposes a new CELP coding method which embeds speech classification in adaptive codebook search. This approach can retain the synthesized speech quality at bit-rates below 4 kb/s. A pitch analyzer is designed to classify each frame by its periodicity, and with a finite-state machine, one of four states is determined. Then the adaptive codebook search scheme is switched according to the state. Simulation results show that higher SEGSNR and lower computation complexity can be achieved, and the pitch contour of the synthesized speech is smoother than that produced by conventional CELP coders.
Research has been conducted in the area of voice processing for over six decades but it has only been in the past few years that the impact of the years of research is starting to be seen in modern telecommunications ...
详细信息
Research has been conducted in the area of voice processing for over six decades but it has only been in the past few years that the impact of the years of research is starting to be seen in modern telecommunications systems. Virtually every area of voice processing, including speech coding, speech synthesis, speech recognition, and even, to a small extent, speaker verification, has left the research laboratory and now appears in a product or service that is in daily use out in the marketplace, often by millions of customers per day. This revolution in voice processing in telecommunications is fueled by algorithmic advances (which improve the quality of the voice processing systems), hardware advances (which provide high processing power and memory at low cost), and networking advances (which provide high bandwidth pipes to the home, office, and throughout the telecommunications network). In this paper we illustrate the impact of voice processing on modern telecommunications by showing the diverse ways in which speech coding, speech synthesis, speech recognition and speaker verification have become embodied in new products and services.
This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral fron...
详细信息
This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated.
Summary form only given. A compression algorithm for high quality speech signal using predictivecoding techniques is developed. Code-excited linear predictive coding (CELPC) is one of the key techniques to compress s...
详细信息
Summary form only given. A compression algorithm for high quality speech signal using predictivecoding techniques is developed. Code-excited linear predictive coding (CELPC) is one of the key techniques to compress speech signal to a bit-rate around 4.8 Kbps. However, due to the heavy computational requirement in the CELPC and speech signals usually can be divided into two portions: namely the based-band and the high-band frequency range. A hybrid CELPC and voice excited linear predictive coding (VELPC) scheme is developed for speech coding to reduce the complexity of the original CELPC. In the algorithm, a speech signal is firstly divided into two portions, the based-band and high-band respectively, in frequency domain, and then the low portion is coded with CELPC and the high-band portion is coded with VELPC. The test experiments showed this new coder can produce synthesized speech with good quality at a better bit rates than the original CELPC. When using the coding methods for the base-band and the high-band signal, we must decide how to divide the speech signal into two portions. In choosing the bandwidth of the base-band signal, there is a trade-off between the coding quality and the bit rate. In our experiment, the bandwidth of the base-band signal is chosen as one fourth of that of the original speech. Subjective evaluation experiments were conducted to test the performance of the hybrid CELPC and VELPC technique. For speech signal sampled at 8 kHz, a bit rate of 4.0 kbps can be achieved with frame intervals of 23 ms. The experimental results showed that the quality of the synthesized speech using hybrid coding technique at the bit rate of 4.0 kbps was almost the same as that of the CELPC at the bit rate of 4.8 kbps.
An important problem in speech coding is the quantization of linearpredictive coefficients (LPC) with the smallest possible number of bits while maintaining robustness to a large variety of speech material and transm...
详细信息
A hybrid approach in determining the excitation vector in a low-delay code excited linearpredictive coder is proposed. By a judicious division of the composite excitation vector into long-term and short-term componen...
详细信息
A hybrid approach in determining the excitation vector in a low-delay code excited linearpredictive coder is proposed. By a judicious division of the composite excitation vector into long-term and short-term components, and the use of switched quantisation, substantial improvement in coding quality is obtained.
Some of the work on speech processing has focused on modeling speech as an AM-FM signal. The success of the AM-FM model motivated us to investigate a similar nonlinear model and examine its application in speaker iden...
详细信息
Some of the work on speech processing has focused on modeling speech as an AM-FM signal. The success of the AM-FM model motivated us to investigate a similar nonlinear model and examine its application in speaker identification. Tests are carried out to compare the performance of the novel cyclic correlation based method with popular speaker identification methods based on cepstra. These studies show that the performance of the proposed method is comparable to the cepstrum based approach at high signal-to-noise ratio, but the former outperforms the latter under noisy conditions.
暂无评论