This paper presents a novel wideband speech coding algorithm called transform predictivecoding (TPC). The main emphasis is on low complexity. TPC uses short-term and long-term prediction to remove the redundancy in s...
详细信息
ISBN:
(纸本)0780331923
This paper presents a novel wideband speech coding algorithm called transform predictivecoding (TPC). The main emphasis is on low complexity. TPC uses short-term and long-term prediction to remove the redundancy in speech. The prediction residual is quantized in the frequency domain based on a calculated noise masking threshold. In its simplest form, the TPC coder uses only open-loop quantization and therefore has a low complexity. A 16 kb/s full-duplex, open-loop TPC coder takes only 22% of the CPU load on a 150 MHz SGI Indy workstation and about 34% on a 90 MHz Pentium PC. The speech quality of TPC is almost transparent at 32 kb/s, very good at 24 kb/s, and acceptable at 16 kb/s. In the second half of the paper, we report our recent progress in using closed-loop quantization techniques to improve TPC output speech quality.
<正>In September 1992,the recommendation G 728,which is a 16kbps LD-CELP speech coder submitted by AT&Twas standarized by *** the process of ratification test[1],the coder’s performances were equivalent to or bett...
详细信息
ISBN:
(纸本)0780329120
<正>In September 1992,the recommendation G 728,which is a 16kbps LD-CELP speech coder submitted by AT&Twas standarized by *** the process of ratification test[1],the coder’s performances were equivalent to or better than that of 32kbps ADPCM for all conditions *** paper,which is based on a G.728 encoding-decoding system simulated in software,studies and tests different parts of the algorithm,espacially that of the postfitter
The ASIC for multi-speaker speech recognition is design in this paper. The LPC-derived cepstral coefficients are chosen as speech features. Templates are trained by K-means clustering algorithm. Two stage recognition ...
详细信息
ISBN:
(纸本)7543909405
The ASIC for multi-speaker speech recognition is design in this paper. The LPC-derived cepstral coefficients are chosen as speech features. Templates are trained by K-means clustering algorithm. Two stage recognition system can not only improve recognition accuracy, but also reduce the delay. The first stage of recognition system uses speech spectrum difference(SSD) algorithm. The second stage uses DTW. The whole recognition system is design into ASIC on high level with VHDL and simulated in Powerview.
A speech code/decode algorithm which combines MBE and LPC speech model is proposed. In this model, the spectral envelope is represented using linear Prediction Coefficients, which are coded using Line Spectrum Frequen...
详细信息
ISBN:
(纸本)7543909405
A speech code/decode algorithm which combines MBE and LPC speech model is proposed. In this model, the spectral envelope is represented using linear Prediction Coefficients, which are coded using Line Spectrum Frequencies (LSFs). It can operate at 2.4 kbps with much higher quality of synthesis speech than LPC-10e and less computation complexity than CELP, VSELP and so on. Therefore it is particularly attractive for VLSI implementation.
In this correspondence, a two-stage approach based on Karhunen-Loeve transform and 2-D prediction is proposed for efficient quantization of line spectrum pair (LSP) parameters of speech. Besides, a switched classifier...
详细信息
In this correspondence, a two-stage approach based on Karhunen-Loeve transform and 2-D prediction is proposed for efficient quantization of line spectrum pair (LSP) parameters of speech. Besides, a switched classifier is incorporated with this approach to reduce the outlier frames (spectral distortion greater than 2 dB) down to about 0.27% and to eliminate frames with spectral distortion greater than 4 dB at an average bit-rate below 19 b/frame.
The authors consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications...
详细信息
The authors consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications in many fields. The report describes a resolution method that is based on an ergodic hidden Markov model (HMM), in which each HMM state corresponds to a signal source. The signal source sequence can be determined by using a decoding procedure (Viterbi algorithm or forward algorithm) over the observed sequence. Baum-Welch training is used to estimate HMM parameters from the training material. As an example of the multiple signal source classification problem, an experiment is performed on unknown speaker classification. The results show a classification rate of 79% for 4 male speakers. The results also indicate that the model is sensitive to the initial values of the ergodic HMM and that employing the long-distance LPC cepstrum is effective for signal preprocessing.
This paper proposes two novel techniques for twinVQ (transform domain weighted interleave VQ) high-quality audio coding scheme for rates lower than 64 kbit/s. One is an extension of the weighted interleave technique t...
详细信息
This paper proposes two novel techniques for twinVQ (transform domain weighted interleave VQ) high-quality audio coding scheme for rates lower than 64 kbit/s. One is an extension of the weighted interleave technique to the time and input channel domains as well as the frequency domain. The other is an efficient representation scheme of the spectral envelope by means of a interpolated square root LPC (linear predictive coding) spectrum.
This paper describes the models of oxy-helium speech corrector, speech coding and mask corrector applicable in underwater voice communication systems. The three problems have been solved using linear predictive coding...
详细信息
This paper describes the models of oxy-helium speech corrector, speech coding and mask corrector applicable in underwater voice communication systems. The three problems have been solved using linear predictive coding (LPC).
The minimum classification error (MCE) has been shown to be effective in improving the performance of a speaker identification system. However, there are still problems to solve, such as the variability of the voice c...
详细信息
The minimum classification error (MCE) has been shown to be effective in improving the performance of a speaker identification system. However, there are still problems to solve, such as the variability of the voice characteristics of a particular speaker through time. In this paper, we analyze the degradation of a Gaussian mixture model (GMM) based text-independent speaker identification system when using test data recorded over six months after the training session, and, in an attempt to avoid this degradation, we study the use of supervised adaptation based on maximum a posteriori (MAP) estimation and MCE. These techniques have been shown to provide good results for speaker adaptation in speech recognition. The major result we have obtained is that, by starting with GMM models trained with only speech from session 1, similar identification results can be obtained for all the other sessions using an incremental adaptation using only 2.5 seconds of speech per speaker and session as data for the MCE training adaptation procedure. We have also found that, in our extreme experimental setup, MAP becomes unhelpful when combined with MCE adaptation.
暂无评论