This paper uses a method of incorporating simultaneous masking into the calculation of a linearpredictive filter (SMLPC) as the front end to a 2kbps waveform interpolation (WI) speech coder. A modification to the mas...
详细信息
ISBN:
(纸本)0780364163
This paper uses a method of incorporating simultaneous masking into the calculation of a linearpredictive filter (SMLPC) as the front end to a 2kbps waveform interpolation (WI) speech coder. A modification to the masking threshold calculation used in SMLPC is proposed. This modification improves the performance of SMLPC in noise like sections by placing greater emphasis on strongly voiced speech. MOS test results reveal that the modified SMLPC improved the perceptual quality of the WI coder. The improvement is significant for female speakers whilst the quality for male speech is virtually unchanged. This result conflicts with previous results reported for SMLPC where only male speech was improved. The change is attributed to the modification of the masking threshold and confirms that adapting the masking threshold according to the pitch of the speech will allow SMLPC to remove more perceptually important information from all input speech than standard LPC.
Line Spectrum Frequencies(LSF) have been the prevailing parameter set to represent LPC coefficients in speech coding. Extensive research has been performed to exploit their interframe and intraframe correlations and q...
详细信息
ISBN:
(纸本)0780364163
Line Spectrum Frequencies(LSF) have been the prevailing parameter set to represent LPC coefficients in speech coding. Extensive research has been performed to exploit their interframe and intraframe correlations and quantize them more efficiently. Interframe coding of LSF's can cause error propagation when frame erasures occur. Since most LSF quantizers were designed with the primary concerns of bit-rate and complexity, less attention was paid to error propagation. We investigate the erasure performance of interframe LSF coding and compare it with an intraframe coding method. Our results show that with only 5% extra bit-rate, intraframe coding is much more robust to frame erasures and a typical improvement of 0.5 dB on spectral distortion can be obtained with 20% packet loss. Subjective listening tests indicate significant improvement as well.
Twenty years of work with sinusoidal modeling of speech has lead to very competitive principles of low rate coding. In this study, we discuss a few issues in the design of a sinusoidal coding system. We stress that by...
详细信息
ISBN:
(纸本)0780364163
Twenty years of work with sinusoidal modeling of speech has lead to very competitive principles of low rate coding. In this study, we discuss a few issues in the design of a sinusoidal coding system. We stress that by a careful design of all blocks of the encoder and decoder, allowing for some additional complexity, it is possible to build a low rate coder free of many of the artifacts associated with conventional vocoding systems. We focus this discussion on multi-band partial voicing and on parameter smoothing and interpolation.
作者:
Cao, BSLucent Technol
Bell Labs Innovat Wireless Network Syst Whippany NJ 07981 USA
This paper presents a new LPC parameter quantization method - SubBand Synthesized LPC Vector Quantization (SBS-LPC-VQ) [9]. In the subband synthesis process, the relationships between subband spectra and whole band LP...
详细信息
ISBN:
(纸本)0780364163
This paper presents a new LPC parameter quantization method - SubBand Synthesized LPC Vector Quantization (SBS-LPC-VQ) [9]. In the subband synthesis process, the relationships between subband spectra and whole band LPC spectrum are established and thus the vector quantized subband LPC parameters are mapped to the whole band LPC parameters. This new SBS-LPC-VQ method overcomes high complexity problem of vector quantization of LPC parameters and isolates the distortion within each subband during the VQ process. It also provides the flexibility of assigning the bits to be used for each subband, choosing the order for LPC filter and determining the number of bands for the subband classification. The Critical band Weighting Spectral Distortion Measure (CW-SDM), which is perceptually motivated objective measure by using a critical band weighting function, is used for measuring the distortion of quantized LPC spectrum. Using this kind of distortion measure, the experiments show that the SBS-LPC-VQ has obtained 24 bits/frame for coding whole 16th order LPC parameters with about 1 dB average spectral distortion. For comparison, the results by conventional Spectral Distortion Measure (SDM) an also presented in the paper.
In this paper, an information theoretic study of properties of the speech spectrum process is performed. Various techniques to model the probability density function are applied to the spectrum source to compute rate-...
详细信息
ISBN:
(纸本)0780364163
In this paper, an information theoretic study of properties of the speech spectrum process is performed. Various techniques to model the probability density function are applied to the spectrum source to compute rate-distortion functions. We estimate the difference in the required rate to achieve a given distortion for three different scenarios: interframe gain exploitation, low-pass filtering of LPC vectors and increased speech signal bandwidth. We obtain fairly consistent results for the different methods of calculating rate-distortion functions. The results show that for close to transparent LPC quantization we, gain 4-6 bits per frame by exploiting first order interframe correlation. The new idea of using low-pass filtered LPC vectors has shown to decrease the coding cost with 1-3 bits per frame, depending on the cutoff frequency.
Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speec...
详细信息
ISBN:
(纸本)0780364163
Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speech quality degrades too much below this bit rate. In this paper we describe a frequency domain speech coder capable of operating at both 2.4 and 1.2kb/s, and produces good quality synthesised speech. Both rates use the same analysis and synthesis building blocks over 20ms, but the 1.2 kb/s coder jointly quantises three sets of parameters every 60 ms to reduce the bit rate while maintaining speech quality. We also describe the quantisation methods used to lower the bit rate from 2.4 kb/s to 1.2 kb/s while retaining most of the quality of the higher bit rate version.
The 4.0 kbit/s speech codec described in this paper is based on a Frequency Domain Interpolative (FDI) coding technique, which belongs to the class of prototype waveform Interpolation (PM;I) coding techniques. The cod...
详细信息
ISBN:
(纸本)0780364163
The 4.0 kbit/s speech codec described in this paper is based on a Frequency Domain Interpolative (FDI) coding technique, which belongs to the class of prototype waveform Interpolation (PM;I) coding techniques. The codec also has an integrated voice activity detector (VAD) and a noise reduction capability. The input signal is subjected to LPC analysis and the prediction residual is separated into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW) components. The SEW magnitude component is quantized using a hierarchical predictive vector quantization approach. The REW magnitude is quantized using a gain and a sub-band based shape. SEW and REW phases are derived at the decoder using a phase model. based on a transmitted measure of voice periodicity. The spectral (LSP) parameters are quantized using a combination of scalar and vector quantizers. The 4.0 kbits/s coder has an algorithmic delay of 60 ms and an estimated floating point complexity of 21.5 MIPS. The performance of this coder has been evaluated using in-house MOS tests under various conditions such as background noise, channel errors, self-tandem, and DTX mode of operation. and has been shown to be statistically equivalent to ITU-T G.729 kbps codec across all conditions tested.
We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated ...
详细信息
ISBN:
(纸本)0780364163
We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.
A new efficient algorithm for quantizing the spectral information for a Pitch-Synchronous CELP (PSCELP) speech coder is proposed. LPC analysis in the PSCELP is carried out once per pitch period. Direct quantization of...
详细信息
ISBN:
(纸本)0780364163
A new efficient algorithm for quantizing the spectral information for a Pitch-Synchronous CELP (PSCELP) speech coder is proposed. LPC analysis in the PSCELP is carried out once per pitch period. Direct quantization of the pitch synchronous LSF vectors would lead to a variable-rate codec, which is inconsistent with the objective of achieving a fixed-rate speech coder operating at 4 kb/s. Hence, a linear trajectory of LSF vectors is selected which can be encoded by one LSF vector each 20 ms. This conversion exploits the high correlation between successive pitch periods of the LSF parameters to achieve joint quantization. A coding rate of 1.2 kb/s is achieved for the LSF information with no noticeable degradation. The proposed algorithm employs linear interpolation at the decoder to recover the spectral parameters for the individual pitch periods used in the pitch-synchronous reconstruction of the speech signal. The comparison simulation results show that this algorithm produces comparable performance to that of LSF's linear interpolation quantization in a time-synchronous CE;LP coder.
This paper presents a 1.2 kbps speech coder based on the mixed excitation linear prediction (MELP) analysis algorithm. In the proposed coder, the MELP parameters of three consecutive frames are grouped into a superfra...
详细信息
This paper presents a 1.2 kbps speech coder based on the mixed excitation linear prediction (MELP) analysis algorithm. In the proposed coder, the MELP parameters of three consecutive frames are grouped into a superframe and jointly quantized to obtain a high coding efficiency. The interframe redundancy is exploited with distinct quantization schemes for different unvoiced/voiced (U/V) frame combinations in the superframe. Novel techniques for improving performance make use of the superframe structure. These include pitch vector quantization using pitch differentials, joint quantization of pitch and U/V decisions and LSF quantization with a forward-backward interpolation method. Subjective test results indicate that the 1.2 kbps speech coder achieves approximately the same quality as the proposed federal standard 2.4 kbps MELP coder.
暂无评论