This paper presents a high quality audio codec design based on a transform-domain weighted interleave vector quantization (Twin-VQ) scheme adopted in the MPEG-4 audio standard. Three novel techniques are employed in t...
详细信息
This paper presents a high quality audio codec design based on a transform-domain weighted interleave vector quantization (Twin-VQ) scheme adopted in the MPEG-4 audio standard. Three novel techniques are employed in this scheme to compress the data, ie, (1) flattening of MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; (2) further flattening of MDCT coefficients by the Bark envelope; and (3) weighted interleave vector quantization. This paper examines the related design issues in implementing an efficient Twin-VQ codec. Fast computation algorithms are derived for the computationally intensive modules. Design parameters of each module are determined and the codebooks for weighted interleave vector quantization are constructed. Experimental results show that the designed codec can compress natural audio efficiently and reproduce high quality outputs.
This paper contributes to narrowband speech enhancement by means of frequency bandwidth extension. A new algorithm is proposed for generating synthetic frequency components in the high-band (i.e., 4-8 kHz) given the l...
详细信息
This paper contributes to narrowband speech enhancement by means of frequency bandwidth extension. A new algorithm is proposed for generating synthetic frequency components in the high-band (i.e., 4-8 kHz) given the low-band ones (i.e., 0-4 kHz) for wide-band speech synthesis. It is based on linear prediction (LPC) analysis-synthesis. It consists of a spectral envelope extension using efficiently line spectral frequencies (LSF) and a bandwidth extension of the LPC analysis residual using a spectral folding. The low-band LSF of the synthesis signal are obtained from the input speech signal and the high-band LSF are estimated from the low-band ones using statistical models. This estimation is achieved by means of four models that are distinguished by means of the first two reflection coefficients obtained from the input signal linear prediction analysis.
This paper presents a hybrid coder with a new phase model to synchronize harmonic and waveform coded segments, with a target bit rate of 4 kbps. The coder also employs a new technique based on analysis by synthesis to...
详细信息
This paper presents a hybrid coder with a new phase model to synchronize harmonic and waveform coded segments, with a target bit rate of 4 kbps. The coder also employs a new technique based on analysis by synthesis to distinguish between stationary and transitional segments. Harmonic excitation is synchronized with the LPC residual by transmitting the location of the pitch pulse closest to the frame boundary and a phase value that represents the shape of the corresponding pitch pulse. The performance of this phase model and the classification technique is evaluated using a hybrid coder. The coder has three modes: scaled white noise excitation colored by LPC for unvoiced, ACELP for transitions, and harmonic excitation for stationary segments. Subjective listening tests show that the coder produces good quality speech and the switching between the modes is transparent.
The analysis of the harmful wastes of the coal preparation mills is presented in this work. The main sources of these wastes are drying installations. Measures for protection of nature were made and measures for incre...
详细信息
ISBN:
(纸本)0780363469
The analysis of the harmful wastes of the coal preparation mills is presented in this work. The main sources of these wastes are drying installations. Measures for protection of nature were made and measures for increasing their efficiency were suggested.
This paper presents the core technology of novel enhancements to achieve toll quality at 4 kbps, our experiments and test results traditional CELP coding, coined eXtended CELP (eX-CELP). It is showed that this technol...
详细信息
This paper presents the core technology of novel enhancements to achieve toll quality at 4 kbps, our experiments and test results traditional CELP coding, coined eXtended CELP (eX-CELP). It is showed that this technology is also successful and suitable for centered on a combined and selective usage of closed-loop/open-both high and medium bit rates. Fig. I and Fig.2 illustrate the basic loop approach, and variant algorithm structure concept. The above structure of the eX-CELP encoder and decoder. two concepts are complemented by new features and refined One of the main themes of the eX-CELP technology is the existing technologies. The eX-CELP paradigm was used in judicious combination of the closed-loop approach and the open-several speech coding systems. It is the core technology of the loop approach, together with a careful selective usage of them. recently chosen candidate for the 3G-CDMA speech codec This mechanism is coined COLA, and its main objective is to standard. It was the best candidate for ITU-T 4 kbps codec intelligently employ the most appropriate approach for different qualification test, and became the basis technology for a types of input signals in order to preserve the perceptually consortium candidate to the ITU-T 4 kbps speech coding important contents. Another important feature in the eX-CELP competition.
Communication devices which perform distributed speech recognition (DSR) tasks currently transmit standardized coded parameters of speech signals. Recognition features are extracted from signals reconstructed using th...
详细信息
Communication devices which perform distributed speech recognition (DSR) tasks currently transmit standardized coded parameters of speech signals. Recognition features are extracted from signals reconstructed using these on a remote server. Since reconstruction losses degrade recognition performance, proposals are being considered to standardize DSR-codecs which derive recognition features, to be transmitted and used directly for recognition. However, such a codec must be embedded on the transmitting device, along with its current standard codec. Performing recognition using codec bitstreams avoids these complications: no additional feature-extraction mechanism is required on the device, and there are no reconstruction losses on the server. We propose an LDA-based method for extracting optimal feature sets from codec bitstreams and demonstrate that features so derived result in improved recognition performance for the LPC, GSM and CELP codecs. For GSM and CELP, we show that the performance is comparable to that with uncoded speech and standard DSR-codec features.
The main goal of automatic speech recognition (ASR) is to produce a machine which will recognize accurately normal human speech from any speaker. The recognition system may be classified as speaker-dependent or speake...
详细信息
ISBN:
(纸本)9775031680
The main goal of automatic speech recognition (ASR) is to produce a machine which will recognize accurately normal human speech from any speaker. The recognition system may be classified as speaker-dependent or speaker-independent and isolated-word or connected word. There are three approaches to research in automatic speech recognition (ASR); the acoustic-phonetic approach, the pattern recognition approach, and the database statistical approach. Two approaches of this kind: hidden Markov model (HMM) and artificial neural network (ANN) are presented in this paper.
Efficient quantization of linear predictive coding (LPC) filter coefficients play an essential role in very-low-bit-rate speech coding systems. This paper examines a new suboptimal matrix quantization scheme for LPC p...
详细信息
ISBN:
(纸本)0780367200
Efficient quantization of linear predictive coding (LPC) filter coefficients play an essential role in very-low-bit-rate speech coding systems. This paper examines a new suboptimal matrix quantization scheme for LPC parameters, called multi-stage matrix quantization (MSMQ), which operates at bit rates between 400 and 800 bit/s. With the new matrix quantization method, using a 22.5 ms LPC analysis frame, spectral distortion about 1 dB is achieved at 800 bit/s. In the proposed coder, line spectral frequency (LSF) parameters of multiple consecutive frames are grouped into a superframe and jointly quantized. The new residual LSF vector quantization scheme gives a bit rate reduction in MSMQ without any additional complexity or storage. The new MSMQ leads into several schemes of various computational complexity/storage characteristics.
The authors address the problem of defining a general class of reject-first possibilistic classifiers. It relies on fuzzy XOR operators based on dual triples (t-norm, t-conorm, complement). Such a classifier operates ...
详细信息
The authors address the problem of defining a general class of reject-first possibilistic classifiers. It relies on fuzzy XOR operators based on dual triples (t-norm, t-conorm, complement). Such a classifier operates in two sequential steps. It starts with testing for exclusive classification by thresholding the fuzzy XOR combination of membership degrees to the different classes. If the pattern has to be rejected, the classifier continues by testing for the kind of rejection encountered (i.e. ambiguity or distance) using another threshold on the fuzzy OR combination.
Voice over IP (VoIP) can be used in a wide variety of applications, all having different requirements. We present JVOIPLIB and JRTPLIB, a VoIP library and an RTP library respectively. Together they make it possible to...
详细信息
ISBN:
(纸本)0769513212
Voice over IP (VoIP) can be used in a wide variety of applications, all having different requirements. We present JVOIPLIB and JRTPLIB, a VoIP library and an RTP library respectively. Together they make it possible to easily add VoIP to various types of applications. Both libraries are written in an object-oriented style in C++, are open-source and are both very extensible. Several measures have been taken to allow good synchronization between the communicating parties.
暂无评论