Recently, the standardization of high-quality speech coding has intensified. In parallel, a number of novel applications are placing new demands on transmission efficiency and quality. In response to such challenges, ...
详细信息
Recently, the standardization of high-quality speech coding has intensified. In parallel, a number of novel applications are placing new demands on transmission efficiency and quality. In response to such challenges, standardization bodies have begun the definition of requirements for the next generation of very low-rate speech coding. Taking a lead in these activities, ANSI committee T1A1 and the ITU-T initiated the definition of the performance and characteristics of a wireline-quality 4-kb/s speech coding algorithm for network applications. In this letter, this emerging set of requirements is presented.
Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a...
详细信息
Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a signal sampled at 8 kHz with aliasing, the decimated signal would be the summation of two speech-like signals, which are the narrowband speech covering 0-4 kHz and the spectrally flipped aliasing component coming from 8-4 kHz. Recently, the performance of speech separation has been remarkably improved with deep learning-based approaches, implying that the narrowband and aliasing components may be able to be separated. In this letter, we propose a novel method for low-rate wideband speech coding utilizing a standard narrowband codec. Instead of coding wideband speech using a wideband codec with a limited bitrate, we propose to decimate the input wideband speech incurring aliasing, and then encode it with a narrowband codec by allocating all the allowed bitrate to 0-4 kHz. After decoding the encoded bitstream, we apply a speech separation technique to obtain the narrowband and aliasing signals, which are then used to reconstruct the wideband speech by expansion, low/highpass filtering, and summation. Experimental results showed that the proposed method could achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.
We review the variable frame rate (VFR) transmission methodology that we developed, implemented, and tested during the period 1973-1978 for efficiently transmitting LPC vocoder parameters extracted from the input spee...
详细信息
We review the variable frame rate (VFR) transmission methodology that we developed, implemented, and tested during the period 1973-1978 for efficiently transmitting LPC vocoder parameters extracted from the input speech at a fixed frame rate. In the VFR method, parameters are transmitted only when their values have changed sufficiently over the interval since their preceding transmission. We explored two distinct approaches to automatic implementation of the VFR method. The first approach bases the transmission decisions on comparisons of the parameter values of the present frame and the last transmitted frame. The second approach, which is based on a functional perceptual model of speech, compares the parameter values of all the frames that lie in the interval between the present frame and the last transmitted frame against a linear model of parameter variation over that interval. The application of VFR transmission to the design of narrow-band LPC speech coders with average bit rates of 2000-2400 bits/s is also considered. The transmission decisions are made separately for the three sets of LPC parameters, pitch, gain, and spectral parameters, using separate VFR schemes. A formal subjective spccch quality test of six selected LPC coders is described, and the results are presented and analyzed in detail. It is shown that a 2075 bit/s VFR coder produces speech quality equal to or better than that of a 5700 bit/s fixed frame rate coder.
A novel low bit-rate high-quality speech coding technique is presented based on a perceptually optimized signal reconstruction method. According to this parametric speech model, the signal's spectral envelope is r...
详细信息
A novel low bit-rate high-quality speech coding technique is presented based on a perceptually optimized signal reconstruction method. According to this parametric speech model, the signal's spectral envelope is reconstructed from non-linear spectral filtering of an excitation signal, which is a combination of a random broadband noise signal with a number of discrete spectral pulses extracted from the original speech using a perceptual model. This general coding platform allows variable bit-rate implementations, starting from 1.9 kbit/s, at which sufficient intelligibility (more than 92%) was measured, while at higher bit-rates (2.8 kbit/s) intelligibility scores were better than 94% with sufficient naturalness in the coded speech. In all cases, the complexity of the proposed system is very low. (C) 1997 Elsevier Science B.V.
A fine-grain pipelined adaptive differential vector quantizer architecture is proposed for low-power speech coding applications. The pipelined architecture is developed by employing the relaxed look-ahead technique. T...
详细信息
A fine-grain pipelined adaptive differential vector quantizer architecture is proposed for low-power speech coding applications. The pipelined architecture is developed by employing the relaxed look-ahead technique. The hardware overhead due to pipelining is only the pipelining latches. Simulations with speech sampled at 8 Khz show that, for a vector dimension of 8, the degradation in the signal-to-noise ratio (SNR) due to pipelining is negligible. Furthermore, this degradation is independent of the level of pipelining. Thus the proposed architecture is attractive from an integrated circuit implementation point of view.
The fundamental of QCELP speech coding technology is introduced. According to the features of TMS320C54X family DSP of TI Inc., the implementation approach of QCELP speech coding with fixed-point DSP (digital signal p...
详细信息
The fundamental of QCELP speech coding technology is introduced. According to the features of TMS320C54X family DSP of TI Inc., the implementation approach of QCELP speech coding with fixed-point DSP (digital signal processor) is presented.
High compression rates of speech signals may be achieved by coding schemes based on relevant linguistic segments. A system is described that relies on a diphone recogniser as the coder and on a speech synthesiser repr...
详细信息
High compression rates of speech signals may be achieved by coding schemes based on relevant linguistic segments. A system is described that relies on a diphone recogniser as the coder and on a speech synthesiser reproducing speech starting from a diphone codebook as the decoder. The spoken message is encoded in textual (phoneme labels) plus prosody representation. This speech coding technique may be used for voice mail or phone communication over low bit rate channels.
A CELP based mixed-source model is described. It uses a mixed excitation which combines a lowpass-filtered adaptive source and a highpass-filtered stochastic source. In addition, one more stochastic source is newly em...
详细信息
A CELP based mixed-source model is described. It uses a mixed excitation which combines a lowpass-filtered adaptive source and a highpass-filtered stochastic source. In addition, one more stochastic source is newly employed for more natural sounding speech. In informal listening tests, the proposed model at 3 kbit/s shows very good performance both in speech quality and intelligibility.
A hybrid approach in determining the excitation vector in a low-delay code excited linear predictive coder is proposed. By a judicious division of the composite excitation vector into long-term and short-term componen...
详细信息
A hybrid approach in determining the excitation vector in a low-delay code excited linear predictive coder is proposed. By a judicious division of the composite excitation vector into long-term and short-term components, and the use of switched quantisation, substantial improvement in coding quality is obtained.
A method for encoding the spectral characteristics of speech, at rates below 180 bit/s, using hierarchical temporal decomposition (HTD) is proposed. A set of the log-area-ratio (LAR) parameters, extracted from a given...
详细信息
A method for encoding the spectral characteristics of speech, at rates below 180 bit/s, using hierarchical temporal decomposition (HTD) is proposed. A set of the log-area-ratio (LAR) parameters, extracted from a given block of speech, are approximated through Gaussian interpolation between the most-steady frames detected by the HTD. This results in a smaller set of parameters which are encoded using vector quantisation. It is shown that the same spectral distortion is obtained with the new coder at a rate of 180 bit/s as that using a scalar quantisation, TD-based coder, at 600 bit/s.
暂无评论