This paper studies algorithms to encode speech with good intelligence and naturalness at very low rates. Naturalness is retained by encoding accurately the speech excitation information from an LPC model. We first bri...
详细信息
This paper studies algorithms to encode speech with good intelligence and naturalness at very low rates. Naturalness is retained by encoding accurately the speech excitation information from an LPC model. We first briefly describe a glottal ARX technique to model the speech signal for high quality. A large reduction in coding rate Is achieved through short-term temporal compression of the speech and vector quantization. After B brief study of short-term temporal decomposition, application of traditional vector quantization to the temporal decomposition output is discussed with a study of distortion measures and codebook generation. Based on properties of short-term temporal decomposition, we introduce finite-state vector quantization to further decrease the coding rate. We then deal with a problem associated with this technique, estimation of a state transition matrix with incomplete data. The general result is that we can design practical coders operating in a range of 450-600 b/s with a delay of about 200 ms and natural-sounding output speech.
This paper presents a new speech coding model targeted at the bit-rate above 4 kbit/s, referred to as multiband code-excited linear prediction (MBCELP). The analysis and synthesis of speech are accomplished in the tim...
详细信息
This paper presents a new speech coding model targeted at the bit-rate above 4 kbit/s, referred to as multiband code-excited linear prediction (MBCELP). The analysis and synthesis of speech are accomplished in the time domain by comparing the original to the synthetic speech while a perceptual criterion is used. A usual short-term linear predictive filter is employed as the synthesis filter;the excitation signal is modelled as a linear combination of a long-term predictive excitation, periodic multiband excitations and a noise-like excitation;no voiced/unvoiced decision is required. The periodic multiband excitation is produced by convoluting a periodic impulse sequence with a sinc function corresponding to a frequency band;the noise-like excitation is represented by a codebook. We estimate a pitch which is appropriate not only to the long-term predictive filter but also to the periodic multiband excitations and to the 'pitch' prefilter in the decoder. Several CELP vocoders are developed as a reference to test the property of the MBCELP vocoder. Listening tests clearly indicate that this vocoder reconstructed very high quality speech without 'buzziness' or 'hoarseness' for both clean and noisy speech. A 4.8 kbit/s MBCELP vocoder is shown as an example. Its perceptual quality is virtually identical to the original 8 kbit/s CELP vocoder and the improved 7.2 kbit/s CELP vocoder. Since less subframes are used for the MBCELP vocoders, their complexity is not greater than that of usual CELP vocoders with the same type of codebook. A lot of techniques used to simplify CELP coding can be also adopted for the MBCELP coding.
A new multi-frame joint quantization algorithm with dynamic weighted inter-frame linear prediction based on mixed excitation linear prediction(MELP) is proposed in this paper. Inencoding stage, a super-frame consists ...
详细信息
ISBN:
(纸本)9783037858417
A new multi-frame joint quantization algorithm with dynamic weighted inter-frame linear prediction based on mixed excitation linear prediction(MELP) is proposed in this paper. Inencoding stage, a super-frame consists of three adjacent single-frames. Fourier magnitudes and aperiodic jitter flag are eliminated. The otherparameters are jointlyquantized. LSF of the first and third frame are quantized as a 20-dimensional *** to the BPVsof super-frame,pitchis quantized with codebooks of dynamic size. In decoding stage, parameters are indexed from corresponding codebook. The LSFof middle frame are predicted from the first and third frame. The weighted factors keep changing in accordance with the BPVs of adjacent five *** show thatthe reconstruction accuracy of LSFis significantly improvedusing dynamic weighted inter-frame linear prediction. Meanwhilethe coding bit rateis reduced to 0.6 kbps.
Subjective quality measurements on three digital speech coders, simulated with mobile radio channel transmission, were performed using the "mean opinion score (MOS)" method. The three speech coding methods t...
详细信息
Subjective quality measurements on three digital speech coders, simulated with mobile radio channel transmission, were performed using the "mean opinion score (MOS)" method. The three speech coding methods tested were: continuously variable slope deltamodulation (CVSD) coding, adaptive predictive coding (APC), and residually excited linear predictive (RELP) coding. Several versions of each coder, with transmission rates in the range of 7.3 to 16.1 kbits/s, were simulated. Five different channel conditions, including three derived from land mobile radio field experiments, were applied to the speech coders' encoded output to study the effects. The results show that of the three coders, the CVSD coder is the most robust to channel errors, but produces reconstructed output speech of unacceptable quality. The 14.4 kbit/s RELP coder produces relatively good Output speech quality, exhibits a mild degree of robustness to mobile radio channel errors, and is slightly less complex than the APC coder. Of the three digital speech coders tested, the RELP coder appears the most suitable for use with land mobile radio. However none of the three coders was able to produce speech of telephone toll quality in a mobile radio environment.
Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low dat...
详细信息
Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low data rates (800-2400 b/s), but they often sound synthetic and generate annoying artifacts such as buzzes, thumps, and tonal noises. These problems increase dramatically if acoustic background noise is present at the speech input. This paper presents a new mixed excitation LPC vocoder model that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech. The new model also eliminates the traditional requirement for a binary voicing decision so that the vocoder performs well even in the presence of acoustic background noise. A 2400-b/s LPC vocoder based on this model has been developed and implemented in simulations and in a real-time system. Formal subjective testing of this coder confirms that it produces natural sounding speech even in a difficult noise environment. In fact, diagnostic acceptibility measure (DAM) test scores show that the performance of the 2400-b/s mixed excitation LPC vocoder is close to that of the government standard 4800-b/s CELP coder.
In November 1995 the International Telecommunication Union Telecommunications Sector (ITU-T) approved an 8-kb/s speech coding algorithm with wireline quality. This culminated the effort that the CCITT had set in motio...
详细信息
In November 1995 the International Telecommunication Union Telecommunications Sector (ITU-T) approved an 8-kb/s speech coding algorithm with wireline quality. This culminated the effort that the CCITT had set in motion in 1990. This article presents the methods for managing the project through its major milestones from setting the terms of reference to the selection, testing, optimization, and dissemination of the algorithm. While G.729 was being finalized, a new requirement for a low complexity 8-b/s speech coding arose. This article explains how the change in scope was accommodated without the unnecessary proliferation of incompatible algorithms.
speech coding plays a significant role in voice communication and improving network bandwidth efficiency for applications that require long-distance communication or storage space utilization. Non-uniform sampling (NU...
详细信息
speech coding plays a significant role in voice communication and improving network bandwidth efficiency for applications that require long-distance communication or storage space utilization. Non-uniform sampling (NUS) is a technique for the same, which performs data reduction by sampling at irregular intervals. In the literature, researchers use the structural property of the speech waveform for studying various NUS methods, such as LCSS, MMD, IPD, and zero-crossing point. However, in this paper, we consider the speech signal's statistical properties to propose an optimal NUS approach. The proposed technique statistically analyzes the speech signal to sample the abrupt changes over a time frame and approximates the signal with minimal reconstruction error using cost and linear penalty functions for avoiding the over-fitting problem. The proposed technique further performs the optimization using the branch-and-bound. To evaluate the proposed NUS, we design a speech waveform encoder called Block Adaptive Amplitude Sampling (BAAS). A BAAS encoder can directly perform statistical analysis on the speech waveform to select data samples corresponding to the most significant changes in the signal. The decoder approximates the eliminated values using linear interpolation. We experimentally study the proposed technique using various matrices and measures such as POLQA and MUSHRA test. The evaluation shows that the proposed NUS technique retains only 25% of data samples to get an acceptable quality signal regeneration. In addition, comparative studies with MMD and IPD show that the proposed algorithm performs 1.6% better with 30% lower MSE scores.
Although linear filters are useful in a various applications in the context of speech processing, there are several evidences for existence of nonlinearity in speech signals. Our main aim is to launch a comprehensive ...
详细信息
Although linear filters are useful in a various applications in the context of speech processing, there are several evidences for existence of nonlinearity in speech signals. Our main aim is to launch a comprehensive investigation into the exploitation of nonlinear Volterra filters in the context of the ADPCM-based speech coding technique, using two methods of forward prediction, based on the LS criterion, and backward prediction, based on both LMS and RLS adaptation algorithms. In any case, after solving some innate problems, for example, ill-conditioning and instability, schemes for optimum exploitation of nonlinear prediction are developed and simulation results are provided, tested with several performance criteria. With forward prediction a scheme is developed to detect and flag those frames for which, after stabilizing, including the quadratic predictor is beneficial. Scalar and vector quantisation methods are used for quantising the residual signal and the filter parameters, respectively. The results show that using this scheme a negligible improvement (up to 0.62 dB in the SNR) can be achieved, in spite of the increase in bit rate and complexity. With backward prediction two frame-based schemes are developed in which for each frame, after examining a set of quadratic filters, the best filter in the sense of the best quality of the reconstructed speech is selected. The ultimate schemes result in an improvement of up to 1.5 dB in the overall SNR of the reconstructed speech at the cost of a slight increase in the bit-rate, a short delay and a demanding increase in the complexity. Copyright (C) 2010 John Wiley & Sons, Ltd.
Three listening-only experiments were conducted to characterize the subjective performance (i.e., speech quality) of 8 kb/s G.729. These experiments evaluated the quality of coded speech under a variety of conditions:...
详细信息
Three listening-only experiments were conducted to characterize the subjective performance (i.e., speech quality) of 8 kb/s G.729. These experiments evaluated the quality of coded speech under a variety of conditions: Interworking with other international and regional speech coding standards;Input speech that had been corrupted by environmental noise;Operation over degraded transmission channels (including random bit errors and a simulated radio channel). The results of these experiments indicate that 8 kb/s G.729 meets the performance requirements that were established at the beginning of the standardization process.
A high quality speech coding algorithm at 2kb/s based on Waveform interpolation (WI) is presented in this paper. In this new 2kb/s WI speech coding, five novel techniques were incorporated, which are Predictive multi-...
详细信息
A high quality speech coding algorithm at 2kb/s based on Waveform interpolation (WI) is presented in this paper. In this new 2kb/s WI speech coding, five novel techniques were incorporated, which are Predictive multi-stage vector quantization (PMSVQ) for Line spectral frequency (LSF), pitch estimation algorithm based on the Dyadic wavelet transform and the Normalized cross-correlation function (DWT-NCCF), predictive Analysis-by-synthesis (A-b-S) vector quantization for Slowly evolving waveform (SEW) magnitude based on the Discrete cosine transform (DCT), matrix quantization for Rapidly evolving waveform (REW) magnitude based on Variable dimension vector quantization (VDVQ) and DCT, and predictive A-b-S VQ scheme for CW power based on the temporal weighting. Subjective quality test results indicate that the reconstructed speech quality of the 2kb/s WI speech coding exceeds that of Mixed-excitation linear predictive (MELP) coding at 2.4kb/s greatly, and is very close to that of Code-excited linear predictive (CELP) coding at 4.8kb/s.
暂无评论