Motivated by the rapid increase of VoIP services with G.711 for telephone speech, a new ITU-T recommendation, G.711.0 (frame-wise stateless lossless compression scheme for G.711 log PCM symbols), has been standardized...
详细信息
ISBN:
(纸本)9781424464258;9780769539942
Motivated by the rapid increase of VoIP services with G.711 for telephone speech, a new ITU-T recommendation, G.711.0 (frame-wise stateless lossless compression scheme for G.711 log PCM symbols), has been standardized. The standard scheme has several coding parts, each of which is adaptively selected depending on the characteristics of the input. Among them, the mapped domain prediction part is the one most frequently activated for normal speech signals. This part consists of linear prediction in the mapped domain and variable length coding of the prediction residual. It is useful for log-compressed/expanded signal, such as ITU-T G.711. This paper describes three newly devised enhancement tools for the coding of prediction residual signals: progressive order prediction, quantized prediction order, and adaptive and sub-frame base coding for separation parameters. The design criterion is the maximization of the averaged FoM (figure of merit) over frame lengths of 40, 80, 160, 240, and 320 samples. The first tool, progressive order prediction associated with the adaptive modification of the separation parameter for the first and second samples, enhances the compression ratio by 0.5 % with a negligible increase of the complexity. The second tool, quantized prediction order, improves the compression ratio by 0.2 % with even reduced complexity. The third tool, sub-frame base adaptive coding of separation parameters, gives a 0.2 % improvement in the compression ratio with comparable complexity. All three schemes are consistently and independently effective for improving the compression ratio, although the amount of improvement with each tool is small. At the same time, none of the tools have any significant impact on computational complexity. Therefore, all the devised tools improve the FoM and have been adopted in the mapped domain prediction part of the ITU-T G.711.0 standard.
Vector Quantization has recently been used in the realization of a speaker-independent digit recognizer, based uniquely on the spectral content of the speech signal. On the other hand, the Hidden Markov Models proved ...
详细信息
Vector Quantization has recently been used in the realization of a speaker-independent digit recognizer, based uniquely on the spectral content of the speech signal. On the other hand, the Hidden Markov Models proved their ability in modelling temporal distortions between different utterances of a word pronounced by several speakers. In term of recognition rate, HMMs are as efficient as the conventional DTW matching, but they need less computation and memory. This paper presents a speaker-independent digit recognition system that combines word-based VQ with HMM, the cost of which is low enough to be implemented on a single signal processor available today. It is the first result of a cooperation project between ENST and the MATRA company, financially supported by the French government. The proposed recognizer is structured in two parts. First, a VQ-preprocessor, with one vector codebook per vocabulary word, performs a coding of the short-time spectrum of the speech signal and realizes an initial sorting. Then HMMs are used to take the final recognition decision.
A new technique for the determination of the LPC coefficients is introduced. In this method the LPC coefficients are shown to be the weighting factors of a simple neuron. This makes this technique suitable for a lot o...
详细信息
A new technique for the determination of the LPC coefficients is introduced. In this method the LPC coefficients are shown to be the weighting factors of a simple neuron. This makes this technique suitable for a lot of applications that uses LPC as a signal processing tool like speech processing and production. The error introduced by this method is comparable to the error obtained by the conventional methods like the autocorrelation method.
Several major modifications to the phonetically segmented vector excitation coding (PS-VXC) coder by the authors (1989, 1990) reported previously have resulted in enhanced speech quality while reducing the delay, comp...
详细信息
Several major modifications to the phonetically segmented vector excitation coding (PS-VXC) coder by the authors (1989, 1990) reported previously have resulted in enhanced speech quality while reducing the delay, complexity, and bit rate. Speech is segmented into variable-length phonetic classes and a VXC coding module is tailored to each class. coding techniques include adaptive linear predictive coding (LPC) analysis and interpolation, two-stage excitation coding of onsets, comb filtering, modified perceptual weighting, and pitch contour smoothing. The improved PS-VXC coder operates at a peak rate of 3.4 kb/s with an average rate of 3.0 kb/s and has a subjective performance closely matching that of the 4.8 kb/s DoD CELP coder.< >
The theory of vector quantization (VQ) of linear predictive coding (LPC) coefficients has established a wide variety of techniques for quantizing LPC spectral shape to minimize overall spectral distortion. Such vector...
详细信息
The theory of vector quantization (VQ) of linear predictive coding (LPC) coefficients has established a wide variety of techniques for quantizing LPC spectral shape to minimize overall spectral distortion. Such vector quantizers have been widely used in the areas of speech coding and speech recognition. The conventional vector quantizer utilizes only spectral shape information and essentially disregards the energy or gain term associated with the optimal LPC fit to the signal being modelled. In this paper we present a method of incorporating LPC spectral shape and energy into the codebook entries of the vector quantizer. To do this we postulate a distortion measure for comparing two LPC vectors which uses a weighted sum of an LPC shape distortion and a log energy distortion. Based on this combined distortion measure we have designed and studied vector quantizers of several sizes for use in isolated word speech recognition experiments. We have found that a fairly significant correlation exists between LPC shape and signal energy; hence a combined LPC shape plus energy vector quantizer with a given distortion requires far fewer codebook entries than one in which LPC shape and energy are quantized separately. Based on isolated word recognition tests on both a 10-digit and a 129 word airlines vocabulary, we have found improvements in recognition accuracy by using the VQ with both LPC shape and energy over that obtained using a VQ with LPC shape alone.
This paper presents an approach of speech enhancement techniques to improve the performance of the robust speaker identification under noisy environments. Start-end points detection, silence part removal, frame segmen...
详细信息
This paper presents an approach of speech enhancement techniques to improve the performance of the robust speaker identification under noisy environments. Start-end points detection, silence part removal, frame segmentation and windowing technique have been used to pre-process and Wiener filter has been used to remove the silence parts from the speech utterances. To extract the features from the speech various speech parameterization techniques that is LPC, LPCC, RCC, MFCC, ¿MFCC and ¿¿MFCC have been simulated. Finally, to measure the performance of the proposed speech enhancement techniques, genetic algorithm has been used as a classifier for the noise robust automated speaker identification system and various experiments have performed on genetic algorithm to select the optimum parameters. According to the NOIZEOUS speech database, the highest identification rate of 70.31 [%] for text-dependent and of 61.26 [%] for text-independent speaker identification system have been achieved.
In most applications of adaptive filtering algorithms to the telephone echo cancellation problem, a linear model is assumed for the hybrid. In this paper, a nonlinear model based on a Volterra series representation is...
详细信息
In most applications of adaptive filtering algorithms to the telephone echo cancellation problem, a linear model is assumed for the hybrid. In this paper, a nonlinear model based on a Volterra series representation is adopted. By using a new and efficient nonlinear frequency domain adaptive algorithm, a nonlinear model for a hybrid in an actual connection has been identified. The nonlinear model is found to be 8 db more accurate than the linear model. The problem of automatic gain control in the design of a full-duplex telephone network with LPC vocoder in the loop, and its interaction with the echo canceller and the hybrid nonlinearity is also discussed.
This paper presents the results of our investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 9.6 kb/s and for channel bit-error rates ...
详细信息
This paper presents the results of our investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 9.6 kb/s and for channel bit-error rates of up to 1%. Important among these aspects are: baseband width, coding of baseband, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder.
The use of a linear periodic controller (LPC) has been proposed as a new approach in the field of model reference adaptive control. The resulting controller can handle rapid changes in plant parameters, and it provide...
详细信息
The use of a linear periodic controller (LPC) has been proposed as a new approach in the field of model reference adaptive control. The resulting controller can handle rapid changes in plant parameters, and it provides smooth transient behavior for a closed-loop system. Moreover, the LPC generates control signals, which are modes in size when measured using the infinity norm. Although the LPC has these advantages, it suffers from poor noise tolerance. The smaller the sampling time is the less noise tolerant the controller is. In this work, to alleviate this drawback, we apply a probing signal with a larger size. The probing size is inversely proportional to the sampling time. The proposed method has significantly better noise rejection but larger control signal.
In this work, based on the MP-CELP speech coding with HPDR technique, fine granularity scalability (FGS) is introduced by adjusting the amount of transmitted fixed excitation information. The FGS feature aim at changi...
详细信息
In this work, based on the MP-CELP speech coding with HPDR technique, fine granularity scalability (FGS) is introduced by adjusting the amount of transmitted fixed excitation information. The FGS feature aim at changing the bit rate of the conventional coding more finely and more smoothly. Through performance analysis and computer simulation, the quality of scalability of the MP-CELP coding is presented with an improvement from conventional scalable MP-CELP. The HPDR technique is also applied to the MP-CELP to use for tonal language, meanwhile it can support the core coding rate of 4.2, 5.5, 7.5 kbps and additional scaled bit rates.
暂无评论