A 1-bit version of the recently proposed generalised hybrid adaptive quantiser (GHAQ) is compared with two other 1-bitadaptive quantisers, in terms of the signal-to-noise ratio obtained when codingspeech in delta mod...
详细信息
A 1-bit version of the recently proposed generalised hybrid adaptive quantiser (GHAQ) is compared with two other 1-bitadaptive quantisers, in terms of the signal-to-noise ratio obtained when codingspeech in delta modulators. The GHAQ is found to have a clear advantage, particularly when the speech is acquired by a telephone microphone. It is also shown that the optimum coefficients of the predictor can be influenced substantially by the design of the adaptive quantiser, and a technique for finding suitable coefficients in this context is described.
A linear predictive coding (LPC) analysis scheme which is applicable to speech coding is proposed. The analysis method, called interpolative LPC (ILPC) analysis, estimates the spectral envelope by incorporating the in...
详细信息
A linear predictive coding (LPC) analysis scheme which is applicable to speech coding is proposed. The analysis method, called interpolative LPC (ILPC) analysis, estimates the spectral envelope by incorporating the interpolation characteristics into the LPC analysis. The ILPC analysis reduces average spectral distortion and the percentage of outlier frames, compared with the conventional LPC analysis followed by linear interpolation.
The addition of custom vector instructions to the G.729A speech coding algorithm is shown to reduce significantly its computational complexity. The identified vector extensions are implemented in the form of a configu...
详细信息
The addition of custom vector instructions to the G.729A speech coding algorithm is shown to reduce significantly its computational complexity. The identified vector extensions are implemented in the form of a configurable vector accelerator, tightly coupled to a 32 bit Sparc V8-compliant reduced instruction set (RISC) processor. Architectural simulation demonstrates that a reduction in complexity of up 60%, for a vector length of sixteen 16 bit elements, is achievable in current VLSI technology.
A fast vector-sum codebook search method for low bit rate speech coding is presented. In this method, the codebook search is simplified by designing a vector-sum codebook that consists of orthonormal regular pulse bas...
详细信息
A fast vector-sum codebook search method for low bit rate speech coding is presented. In this method, the codebook search is simplified by designing a vector-sum codebook that consists of orthonormal regular pulse basis vectors. A further simplification is achieved by adopting backward filtering. The method proposed has significantly reduced computational complexity, compared with the conventional VSELP, without producing any additional degradation in the quality of the synthesised speech.
A vector-adaptive vector quantization (VAVQ) scheme, which may be viewed as a generalization of gain-adaptive vector quantization, is described. The proposed scheme adjusts each component of the encoding signal vector...
详细信息
A vector-adaptive vector quantization (VAVQ) scheme, which may be viewed as a generalization of gain-adaptive vector quantization, is described. The proposed scheme adjusts each component of the encoding signal vector according to a statistical estimate of the signal characteristics. The VAVQ scheme can cope with large input dynamic ranges, can be used in either the time domain or the transform domain, and exhibits approximately 4 dB improvement in segmental SNR over the fixed VQ.
Recently, the standardization of high-quality speech coding has intensified. In parallel, a number of novel applications are placing new demands on transmission efficiency and quality. In response to such challenges, ...
详细信息
Recently, the standardization of high-quality speech coding has intensified. In parallel, a number of novel applications are placing new demands on transmission efficiency and quality. In response to such challenges, standardization bodies have begun the definition of requirements for the next generation of very low-rate speech coding. Taking a lead in these activities, ANSI committee T1A1 and the ITU-T initiated the definition of the performance and characteristics of a wireline-quality 4-kb/s speech coding algorithm for network applications. In this letter, this emerging set of requirements is presented.
Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a...
详细信息
Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a signal sampled at 8 kHz with aliasing, the decimated signal would be the summation of two speech-like signals, which are the narrowband speech covering 0-4 kHz and the spectrally flipped aliasing component coming from 8-4 kHz. Recently, the performance of speech separation has been remarkably improved with deep learning-based approaches, implying that the narrowband and aliasing components may be able to be separated. In this letter, we propose a novel method for low-rate wideband speech coding utilizing a standard narrowband codec. Instead of coding wideband speech using a wideband codec with a limited bitrate, we propose to decimate the input wideband speech incurring aliasing, and then encode it with a narrowband codec by allocating all the allowed bitrate to 0-4 kHz. After decoding the encoded bitstream, we apply a speech separation technique to obtain the narrowband and aliasing signals, which are then used to reconstruct the wideband speech by expansion, low/highpass filtering, and summation. Experimental results showed that the proposed method could achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.
We review the variable frame rate (VFR) transmission methodology that we developed, implemented, and tested during the period 1973-1978 for efficiently transmitting LPC vocoder parameters extracted from the input spee...
详细信息
We review the variable frame rate (VFR) transmission methodology that we developed, implemented, and tested during the period 1973-1978 for efficiently transmitting LPC vocoder parameters extracted from the input speech at a fixed frame rate. In the VFR method, parameters are transmitted only when their values have changed sufficiently over the interval since their preceding transmission. We explored two distinct approaches to automatic implementation of the VFR method. The first approach bases the transmission decisions on comparisons of the parameter values of the present frame and the last transmitted frame. The second approach, which is based on a functional perceptual model of speech, compares the parameter values of all the frames that lie in the interval between the present frame and the last transmitted frame against a linear model of parameter variation over that interval. The application of VFR transmission to the design of narrow-band LPC speech coders with average bit rates of 2000-2400 bits/s is also considered. The transmission decisions are made separately for the three sets of LPC parameters, pitch, gain, and spectral parameters, using separate VFR schemes. A formal subjective spccch quality test of six selected LPC coders is described, and the results are presented and analyzed in detail. It is shown that a 2075 bit/s VFR coder produces speech quality equal to or better than that of a 5700 bit/s fixed frame rate coder.
A novel low bit-rate high-quality speech coding technique is presented based on a perceptually optimized signal reconstruction method. According to this parametric speech model, the signal's spectral envelope is r...
详细信息
A novel low bit-rate high-quality speech coding technique is presented based on a perceptually optimized signal reconstruction method. According to this parametric speech model, the signal's spectral envelope is reconstructed from non-linear spectral filtering of an excitation signal, which is a combination of a random broadband noise signal with a number of discrete spectral pulses extracted from the original speech using a perceptual model. This general coding platform allows variable bit-rate implementations, starting from 1.9 kbit/s, at which sufficient intelligibility (more than 92%) was measured, while at higher bit-rates (2.8 kbit/s) intelligibility scores were better than 94% with sufficient naturalness in the coded speech. In all cases, the complexity of the proposed system is very low. (C) 1997 Elsevier Science B.V.
High compression rates of speech signals may be achieved by coding schemes based on relevant linguistic segments. A system is described that relies on a diphone recogniser as the coder and on a speech synthesiser repr...
详细信息
High compression rates of speech signals may be achieved by coding schemes based on relevant linguistic segments. A system is described that relies on a diphone recogniser as the coder and on a speech synthesiser reproducing speech starting from a diphone codebook as the decoder. The spoken message is encoded in textual (phoneme labels) plus prosody representation. This speech coding technique may be used for voice mail or phone communication over low bit rate channels.
暂无评论