Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speec...
详细信息
Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speech quality degrades too much below this bit rate. We describe a frequency domain speech coder capable of operating at both 2.9 and 1.2 kb/s, and produces good quality synthesised speech. Both rates use the same analysis and synthesis building blocks over 20 ms, but the 1.2 kb/s coder jointly quantises three sets of parameters every 60 ms to reduce the bit rate while maintaining speech quality. We also describe the quantisation methods used to lower the bit rate from 2.4 kb/s to 1.2 kb/s while retaining most of the quality of the higher bit rate version.
This paper investigates the influence of GSM speech coding on text independent speaker recognition performance. The three existing GSM speech coder standards were considered. The whole TIMIT database was passed throug...
详细信息
This paper investigates the influence of GSM speech coding on text independent speaker recognition performance. The three existing GSM speech coder standards were considered. The whole TIMIT database was passed through these coders, obtaining three transcoded databases. In a first experiment, it was found that the use of GSM coding degrades significantly the identification and verification performance (performance in correspondence with the perceptual speech quality of each coder). In a second experiment, the features for the speaker recognition system were calculated directly from the information available in the encoded bit stream. It was found that a low LPC order in GSM coding is responsible for most performance degradations. By extracting the features directly from the encoded bit-stream, we also managed to obtain a speaker recognition system equivalent in performance to the original one which decodes and reanalyzes speech before performing recognition.
In this paper, we propose a low bit rate speech vocoder and its corresponding VLSI implementation. The vocoder exploits the interpolation property so that the fine quality in synthesized speech is obtained even though...
详细信息
ISBN:
(纸本)0780354826
In this paper, we propose a low bit rate speech vocoder and its corresponding VLSI implementation. The vocoder exploits the interpolation property so that the fine quality in synthesized speech is obtained even though the bit rate is as low as 1.6 kbps. Two novel methods including pitch detection and LSP decoding which are suitable for VLSI implementation are also proposed. The heuristic pitch detection algorithm avoids the heavy computational load introduced by the traditional normalized autocorrelation method. The memory storing triangular function value is no longer needed after adopting the new LSP decoding process. The chip is designed with area effective feature and is suitable for stand alone application.
Efficiency of cochlear stimulation is closely related to the signal processing of the input speech. A variety of techniques were proposed and used for cochlear prosthesis speech analysers. So different seems to be, th...
详细信息
ISBN:
(纸本)0780364651
Efficiency of cochlear stimulation is closely related to the signal processing of the input speech. A variety of techniques were proposed and used for cochlear prosthesis speech analysers. So different seems to be, these techniques are based on two major approaches. The first one consists of stimulating the cochlea according to extracted speech features and the second is based on wide band speech characteristics. Clinically, it is well known that in their initial rehabilitation phase, new cochlear prostheses users face some problems of distinguishing what they hear. This is due to the infinity of stimulation set related to the infinite possibilities of the input speech. In order to overcome this problem and minimise the rehabilitation period, a new stimulation algorithm based upon vector quantization is proposed.
In this paper, a novel linearpredictive method (SELP) for speech spectra modelling is proposed. This method allows one to develop an all-pole filter which combines p+1 consecutive even preceding samples of the speech...
详细信息
ISBN:
(纸本)0780365429
In this paper, a novel linearpredictive method (SELP) for speech spectra modelling is proposed. This method allows one to develop an all-pole filter which combines p+1 consecutive even preceding samples of the speech signal x(n) into p pairs for linear extrapolation. In addition, a weighting selective scheme is employed to obtain high signal-to-error ratio (SER). Comparing to the traditional LPC modelling, the proposed filter's order is then raised to 2p+2 when both filters are with p-normal equations. In order to demonstrate the proposed method usefulness, this new model is simulated at 22.05 kHz speech spectra. Experimental results show that at least 10 dB SER improvement is obtained with p=5 when compared with that of LPC modelling.
This paper presents a new shaped fixed codebook (FCB) search technique for code excited linearpredictive (CELP) coding. The state of art CELP coding techniques operate at rates above 4.0 kbps, as it gets harder to bu...
详细信息
This paper presents a new shaped fixed codebook (FCB) search technique for code excited linearpredictive (CELP) coding. The state of art CELP coding techniques operate at rates above 4.0 kbps, as it gets harder to build a good FCB contribution with a minimal bit budget. In this paper the shaped FCB search is presented to ease this problem, and achieve a better FCB contribution with a reduced bit budget. The shaped FCB search integrated to a 4 kbps CELP coder is presented and the subjective performance results are reported which show that the coder is significantly better than the IS-127 half rate coder at 4 kbps.
In this article, a low bit-rate low-complexity block-based trellis quantization (BTQ) scheme is proposed for quantization of line spectral frequencies for speech coding applications. The branches in the trellis diagra...
详细信息
In this article, a low bit-rate low-complexity block-based trellis quantization (BTQ) scheme is proposed for quantization of line spectral frequencies for speech coding applications. The branches in the trellis diagram correspond to the LSF difference code-words and the states correspond to quantized LSF parameters. An efficient algorithm for the index generation (finding the index of a path in the trellis) is introduced. The proposed BTQ achieves the transparent coding quality at 23 bits/frame (1150 b/sec.) and offers a gain of 3 b/frame (150 b/s) and significant reduction in complexity, compared to the IS-641 split-VQ. An interframe BTQ scheme is also presented to exploit the redundancies between the adjacent frames. The interframe scheme is found to achieve an additional 50 b/sec reduction of the bit-rate and is based on adaptive block-based trellis quantization of the prediction residues.
A novel a audio/speech coding algorithm, hybrid audio coding (HAC) is described. New features of the algorithm include window switching with generalized MDCT, an improved quantization scheme of the MDCT coefficients, ...
详细信息
A novel a audio/speech coding algorithm, hybrid audio coding (HAC) is described. New features of the algorithm include window switching with generalized MDCT, an improved quantization scheme of the MDCT coefficients, and waveform normalization in the time domain. HAC provides a good quality at a bit rate of 8 to 16 kbps, and it is also proven that the developed algorithm is effective for both audio and speech signals.
Distributed speech recognition services (DSRSs) provide an anytime, anywhere and any-device speech recognition environment that is intelligent enough to interact with users in a more natural manner. The primary goal i...
详细信息
Distributed speech recognition services (DSRSs) provide an anytime, anywhere and any-device speech recognition environment that is intelligent enough to interact with users in a more natural manner. The primary goal is to provide users with the ability to dictate commands and/or documents among other potential services. The system coordinates the efforts of applications running in a distributed environment. For example, a user is able to dictate a document using their local word processor and a DSRS's remotely located speech engine. DSRSs encourage cooperation among individual programs in order to combine the efforts of individual applications to fulfill a user's request.
暂无评论