We propose an adaptive short-term postfilter for speech coders by incorporating the properties of the pseudo-cepstrum. Since the proposed postfilter implicitly has a characteristic of tilt compensation, it does not re...
详细信息
We propose an adaptive short-term postfilter for speech coders by incorporating the properties of the pseudo-cepstrum. Since the proposed postfilter implicitly has a characteristic of tilt compensation, it does not require an additional tilt compensation filter as conventional techniques. We derive a relationship between the parameters of the proposed postfilter based on a minimum phase distortion criterion, and show a simple tuning procedure for the parameters. It is also shown that the postfilter can be implemented with a lower order. By applying this postfilter to several international speech coding standards, we reduce the complexity of the speech coders while obtaining comparable performance to conventional approaches.
The paper addresses a new mixed model for characterizing LPC excitation on a 3-band basis through analyzing harmonic structure of the residual signal. In addition, a sub-frame based analysis is developed for detecting...
详细信息
The paper addresses a new mixed model for characterizing LPC excitation on a 3-band basis through analyzing harmonic structure of the residual signal. In addition, a sub-frame based analysis is developed for detecting both aperiodic pulses and noisy signals, which plays a major role in reduction of perceptual errors introduced by some certain consonants. Preliminary results show that near natural speech is achieved at 1050 bps, allocated to the excitation parameters, suggesting superiority of the proposed coding scheme to the MELP-2400 coding standard, in the sense of perceptual quality of reconstructed speech.
Pitch period (or fundamental frequency) extraction plays an important role on speech processing and has a wide spread of applications in systems associated with speech. Many pitch extraction methods have been proposed...
详细信息
Pitch period (or fundamental frequency) extraction plays an important role on speech processing and has a wide spread of applications in systems associated with speech. Many pitch extraction methods have been proposed so far, but improvement in noisy environments is still a remaining subject. In this paper, we propose a modified version of the autocorrelation method which is well known to be robust against noise. Utilizing that the difference function (amplitude difference function) has similar characteristics with the autocorrelation function, the autocorrelation function is weighted by the reciprocal of the difference function. By simulation experiments based on continuous speech, it is shown that the proposed pitch extraction method behaves more robustly than the conventional methods against additive noise, and especially it is very effective at low signal-to-noise ratio.
In this paper, we describe the real-time implementation effort of MPEG-4 audio HVXC (Harmonic Vector eXcitation coding) encoder algorithm for very low bitrates, which has target applications from mobile communications...
详细信息
ISBN:
(纸本)0780354826
In this paper, we describe the real-time implementation effort of MPEG-4 audio HVXC (Harmonic Vector eXcitation coding) encoder algorithm for very low bitrates, which has target applications from mobile communications to Internet telephony, on current high performance floating point DSP. Using the C-language and assembly-language level optimization for time-critical functional codes, utilizing the internal program memory of the DSP as the program cache, and further utilizing the internal data memory operation and DMA functionality we could get a goal of realtime operation of HVXC encoder both at 2 kbit/s and at 4 kbit/s.
The paper describes a 12.8 kbit/s LD-CELP speech coding algorithm on the basis of the G.728 speech codec, and presents the concept of codeword use frequency (CUF). The experimental results show that the method can dec...
详细信息
The paper describes a 12.8 kbit/s LD-CELP speech coding algorithm on the basis of the G.728 speech codec, and presents the concept of codeword use frequency (CUF). The experimental results show that the method can decrease effectively the complexity of vector quantization codebooks and give better synthesis speech quality than other similar methods. The algorithm is implemented in duplex real-time on a high-speed DSP system with two TMS320C31 chips.
A multivariate-state HMM-an HMM with a vector state variable-can be used to find jointly optimal phonetic and formant transcriptions of an utterance. The complexity of searching a multivariate state space using the Ba...
详细信息
A multivariate-state HMM-an HMM with a vector state variable-can be used to find jointly optimal phonetic and formant transcriptions of an utterance. The complexity of searching a multivariate state space using the Baum-Welch algorithm is substantial, but may be significantly reduced if the formant frequencies are assumed to be conditionally independent given knowledge of the phone. Operating with a known phonetic transcription, the multivariate-state model can provide a maximum a posteriori formant trajectory, complete with confidence limits on each of the formant frequency measurements. The model can also be used as a phonetic classifier by adding the probabilities of all possible formant trajectories. A test system is described which requires only nine trainable parameters per formant per phonetic state: five parameters to model formant transitions, and four to model spectral observations. Further simplifications were achieved through parameter tying.
The authors propose a novel method for improving the performance of ITU-T G.729A for Voice-over-IP (VoIP) applications. One of the most important features for a speech compression algorithm that is to be used for VoIP...
详细信息
The authors propose a novel method for improving the performance of ITU-T G.729A for Voice-over-IP (VoIP) applications. One of the most important features for a speech compression algorithm that is to be used for VoIP is that it be resistant to packet loss. Because of the memory that resides in the G.729A algorithm, a packet loss will not only cause degradations during the period of the loss but also following the packet loss due to the differing states of the encoder and decoder. The G.729A algorithm deals quite well with the packet loss period through error concealment, but it does not deal with the state error that follows the packet loss. The paper introduces a new scheme called recovery by reinitialization (RbR) that reduces this state error at minimal cost.
The 4.0 kbit/s speech codec described is based on a frequency domain interpolative (FDI) coding technique, which belongs to the class of prototype waveform interpolation (PWI) coding techniques. The codec also has an ...
详细信息
The 4.0 kbit/s speech codec described is based on a frequency domain interpolative (FDI) coding technique, which belongs to the class of prototype waveform interpolation (PWI) coding techniques. The codec also has an integrated voice activity detector (VAD) and a noise reduction capability. The input signal is subjected to LPC analysis and the prediction residual is separated into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW) component. The SEW magnitude component is quantized using a hierarchical predictive vector quantization approach. The REW magnitude is quantized using a gain and a sub-band based shape. The SEW and REW phases are derived at the decoder using a phase model, based on a transmitted measure of voice periodicity. The spectral (LSP) parameters are quantized using a combination of scalar and vector quantizers. The 4.0 kbits/s coder has an algorithmic delay of 60 ms and an estimated floating point complexity of 21.5 MIPS. The performance of this coder has been evaluated using in-house MOS tests under various conditions such as background noise, channel errors, self-tandem, and DTX mode of operation, and has been shown to be statistically equivalent to ITU-T G.729 8 kbps codec across all conditions tested.
A new efficient algorithm for quantizing the spectral information for a pitch-synchronous CELP (PSCELP) speech coder is proposed. LPC analysis in the PSCELP is carried out once per pitch period. Direct quantization of...
详细信息
A new efficient algorithm for quantizing the spectral information for a pitch-synchronous CELP (PSCELP) speech coder is proposed. LPC analysis in the PSCELP is carried out once per pitch period. Direct quantization of the pitch synchronous LSF vectors would lead to a variable-rate codec, which is inconsistent with the objective of achieving a fixed-rate speech coder operating at 4 kb/s. Hence, a linear trajectory of LSF vectors is selected which can be encoded by one LSF vector each 20 ms. This conversion exploits the high correlation between successive pitch periods of the LSF parameters to achieve joint quantization. A coding rate of 1.2 kb/s is achieved for the LSF information with no noticeable degradation. The proposed algorithm employs linear interpolation at the decoder to recover the spectral parameters for the individual pitch periods used in the pitch-synchronous reconstruction of the speech signal. The comparison simulation results show that this algorithm produces comparable performance to that of LSF's linear interpolation quantization in a time-synchronous CELP coder.
We propose a computation reduction method of the real root method that is mainly used in the CELP (code excited linear prediction) vocoder. The real root method is that if polynomial equations have real roots, we are ...
详细信息
We propose a computation reduction method of the real root method that is mainly used in the CELP (code excited linear prediction) vocoder. The real root method is that if polynomial equations have real roots, we are able to find those and transform them into the LSF. However, this method takes much time to compute, because the root searching is processed sequentially in the frequency region. But, the important characteristic of the LSF is that most of the coefficients occur in a specific frequency region. So, the searching frequency region is ordered by each coefficient's distribution, and the coefficients are searched in the ordered frequency region. The transformation time can be reduced further by this method than by the sequential searching method in the frequency region. When we compare this proposed method with the conventional real root method, the experimental result is that the searching time was reduced by about 46% in average.
暂无评论