A delay-free audio coding scheme based on ADPCM with adaptive pre- and post-filtering is presented. The pre-/post-filters are realized as a cascade of shelving filters, designed to match the characteristics of human p...
详细信息
ISBN:
(纸本)9781424414833
A delay-free audio coding scheme based on ADPCM with adaptive pre- and post-filtering is presented. The pre-/post-filters are realized as a cascade of shelving filters, designed to match the characteristics of human perception. The pre- and post-filters are adapted by dynamic compression of the respective sub-bands. The adaption is backward-adaptive, i.e. is fed by the reconstructed signal, which eliminates the need to transmit the filter coefficients and allows delay-free operation. This pre- and post-filtering significantly improves the audio quality compared to a plain ADPCM codec, as underlined by objective measurements. Since the base ADPCM used is also delay-free, the resulting coding system works without any algorithmic delay.
Voiced/Unvoiced (V/U) classification is an important parameter in low bit-rate speech coding algorithms. An algorithm that recovers the V/U classification from the linear prediction coding (LPC) coefficients and the g...
详细信息
ISBN:
(纸本)9781424421787
Voiced/Unvoiced (V/U) classification is an important parameter in low bit-rate speech coding algorithms. An algorithm that recovers the V/U classification from the linear prediction coding (LPC) coefficients and the gain in the speech decoder is proposed. Two Gaussian mixture models (GMM) are employed to model the joint probability of these parameters and to perform the V/U estimation. Experiments show the performance improvements of the proposed algorithm over the V/U classifier used in mixed excitation LPC vocoder (MELP). The proposed algorithm operates only at the receiving end and saves all the bits originally used for V/U quantization.
This paper examines the efficient quantization of LSP parameters for very low bit rate vocoder below 300bps, a new quantization scheme called variable dimension matrix quantization (VDMQ) is presented In the VDMQ sche...
详细信息
ISBN:
(纸本)9781424421787
This paper examines the efficient quantization of LSP parameters for very low bit rate vocoder below 300bps, a new quantization scheme called variable dimension matrix quantization (VDMQ) is presented In the VDMQ scheme, the extracted LSP parameters matrix with variable dimension is quantized directly without dimension conversion. Based on the distance measure definition between low LSP matrices with different dimension, the optimal codeword is deduced Theoretical analysis and experiment results show that the VDMQ scheme performs better than the segment quantization and matrix quantization scheme at very low bit rate. Also, the codebook storage is almost reduced by 90%. The VDMQ scheme provides a new effective approach for efficient LSP parameters quantization at very low bit rate.
In this paper a new feature extraction methods, which utilize reduced order linear predictive coding (LPC) coefficients for speech recognition, have been proposed The coefficients have been derived from the speech fra...
详细信息
ISBN:
(纸本)9781424424085
In this paper a new feature extraction methods, which utilize reduced order linear predictive coding (LPC) coefficients for speech recognition, have been proposed The coefficients have been derived from the speech frames decomposed using Discrete Wavelet Transform (DWT). In the literature it is assumed that the speech frame of size 10 msec to 30 msec is stationary, however, in practice different parts of the speech signal may convey different amount of information (hence may not be perfectly stationary). LPC coefficients derived from subband decomposition of speech frame provide better representation than modeling the frame directly. Experimentally it has been shown that, the proposed approaches provide effective (better recognition rate) and efficient (reduced feature vector dimension) features. The speech recognition system using the continuous Hidden Markov Model (HMM) has been implemented. The proposed algorithms are evaluated using NIST TI-46 isolated-word database.
We propose a system that is capable of improving intelligibility of speech by those with hearing impairment. We have found the speech often had problems in the intonation, the duration of the unvoiced consonants, and ...
详细信息
ISBN:
(纸本)9781424425709
We propose a system that is capable of improving intelligibility of speech by those with hearing impairment. We have found the speech often had problems in the intonation, the duration of the unvoiced consonants, and the tone of the voiced phonemes. The system systematically compensates problematic components of the speech using the counterparts in normal speech. It corrects the intonation using TD-PSOLA and elongates consonants by repeating the original waveform using inverting technique for making the result continuously connected until the duration reaches a threshold. Experimental results show that the proposed method successfully improves the intelligibility of the speech from 28% to 35% by making phonemes more articulated, and making double consonants, V of syllabic consonant, and unvoiced consonants perceived clearer.
We propose a linear predictive coding technique for multi-channel electromyographic (EMG) recordings. The signals are acquired using two-dimensional grid of electrodes which generate strongly correlated signals. Previ...
详细信息
ISBN:
(纸本)9781424414833
We propose a linear predictive coding technique for multi-channel electromyographic (EMG) recordings. The signals are acquired using two-dimensional grid of electrodes which generate strongly correlated signals. Previous work only considered spectral redundancy across the signal matrix. In this paper we exploit the correlation present in the residual signals, i.e., the signals after the short term prediction. The proposed technique achieves a compression ratio of about 1 divided by 9, i.e., slightly better than spectral-only decorrelation methods, but with a strong increase of approximately 3.2 dB SNR in the quality of the reconstructed waveform.
A common technique to deploy linear prediction to non-stationary signals is time segmentation and local analysis. Variations of a process within such a segment cause inaccuracies. In this paper, we model the temporal ...
详细信息
ISBN:
(纸本)9781424414833
A common technique to deploy linear prediction to non-stationary signals is time segmentation and local analysis. Variations of a process within such a segment cause inaccuracies. In this paper, we model the temporal changes of linear prediction coefficients (LPCs) as a Fourier series. We obtain a compact description of the vocal tract model limited by the predictor order and the maximum Doppler frequency. Filter stability is guaranteed by all-pass filtering, deploying the human ear's insensitivity to absolute phase. The periodicity constraint induced by the Fourier series is counteracted by oversampling in the Doppler domain. With this approach, the number of coefficients required for the vocal tract modeling is significantly reduced compared to a LPC system with block-wise adaptation while exceeding its prediction gain. As a by-product it is found that the Doppler frequency of the vocal tract is in the order of 10 Hz. A generalization of the algorithm to an auto-regressive moving average model with time-correlated filter coefficients is straight forward.
This paper proposes a new structure for a scalable codec. Our proposed codec works with 10 ms input frame for wideband speech and audio signals at bit rates ranging from 8 to 32 kbit/s. The core layer is the ITU-T G.7...
详细信息
ISBN:
(纸本)9781424414833
This paper proposes a new structure for a scalable codec. Our proposed codec works with 10 ms input frame for wideband speech and audio signals at bit rates ranging from 8 to 32 kbit/s. The core layer is the ITU-T G.729 at 8 kbit/s producing a narrowband output. The first enhancement layer is a band-witdh extension providing a wideband output with 2 kbit/s. The second enhancement layer is based on algebraic quantization of wavelet packet coefficients and improves gradually the synthesized signal as the bitrate increases. For speech signals, at bitrates of 24 and 32 kbit/s, the codec is shown to be equivalent to the ITU-T G.722 codec at 56 and 64 kbit/s, respectively. Moreover, the codec at 32 kbit/s is assessed to be equivalent to the recently standardized embedded codec ITU-T G.729.1 at the same bitrate with a lower algorithmic delay.
For mobile communication systems computational complexity and memory requirements are serious problems in real-time digital signal processing of speech signal. In this article we proposed new structuralization algorit...
详细信息
For mobile communication systems computational complexity and memory requirements are serious problems in real-time digital signal processing of speech signal. In this article we proposed new structuralization algorithm intended to split vector quantizer codebook of LSF coefficients. Fast search procedure, based on structure of codebook and description tree, allows reduce the entire quantity of comparisons over searching the codebook. Our approach allows us to eliminate in search procedure codevectors with minimal probability of belonging to solution and create fast codebook search algorithm with significant decrease of complexity.
This paper presents a novel method for estimating formant frequencies and bandwidths based on an underlying vocal tract model. A novel statistical model for vocal tract cross-sectional areas is developed which allows ...
详细信息
This paper presents a novel method for estimating formant frequencies and bandwidths based on an underlying vocal tract model. A novel statistical model for vocal tract cross-sectional areas is developed which allows computation of full likelihood functions. Modifications to the basic particle filter algorithm have also been developed to help combat both diversity depletion and convergence problems. The performance of the method is evaluated against hand labeled formant database (L. Deng et al., 2003).
暂无评论