The method of speech codingcelp is extensively used in much voice communication-, multimedia-, video conference and other systems. There are a lot of papers related to celp coding characteristics improvement for a be...
详细信息
The method of speech codingcelp is extensively used in much voice communication-, multimedia-, video conference and other systems. There are a lot of papers related to celp coding characteristics improvement for a better speech quality after decoding. Most of the papers are dedicated to lower the rate of celp coded speech transmission. One problem related to low rate celp speech transmission with a good speech decoding quality is noising in the channel for transmission. The goal of this paper is to combine the advantages and the possibilities of celp speech coding to reduce the rate of transmission and the methods of channel coding to protect the most important celp coding parameters in each speech frame such as line prediction coefficients, excitation indexes etc.
A trained sparse conjugate codebook is proposed for improving the speech quality of celp-based coding in a noisy environment. Although celp coding provides high quality at a low bit rate in a silent environment (creat...
详细信息
A trained sparse conjugate codebook is proposed for improving the speech quality of celp-based coding in a noisy environment. Although celp coding provides high quality at a low bit rate in a silent environment (creating clean speech), it cannot provide a satisfactory quality in a noisy environment because the conventional fixed codebook is designed to be suitable for clean speech. The proposed codebook consists of two sub-codebooks;each sub-codebook consists of a random component and a trained component. Each component has excitation vectors consisting of a few pulses. In the random component, pulse position and amplitude are determined randomly. Since the random component does not depend on the speech characteristics, it handles noise better than the trained one. The trained component maintains high quality for clean speech. Since excitation vector is the sum of the two sub-excitation vectors, this codebook handles various speech conditions by selecting a sub-vector from each component. This codebook also reduces the computational complexity of a fixed codebook search and memory requirements compared with the conventional codebook. Subjective testing (absolute category rating (ACR) and degradation category rating (DCR)) indicated that this codebook improves speech quality compared with the conventional trained codebook for noisy speech. The ACR test showed that the quality of the 8 kbit/s celp coder with this codebook is equivalent to that of the 32 kbit/s ADPCM for clean speech.
Today, the need for variable bit rate coding exists to an increasing extent for, such as videoconferencing, audioconferencing, packet circuit multiplication systems and mobile radio applications. This article extends ...
详细信息
Today, the need for variable bit rate coding exists to an increasing extent for, such as videoconferencing, audioconferencing, packet circuit multiplication systems and mobile radio applications. This article extends the concept of embedded coding to include multi-stage celp and VSELP coding at variable bit rates. Firstly, this article presents a unified approach to multi-stage celp and VSELP coding. Secondly, this contribution provides a general expression for the celp/VSELP error criterion in the case of a sequential search for successive indexes and gains, including gain reoptimization at each stage. Thirdly, this paper outlines the development of a recursive algorithm, used in order to solve this least-squares problem. As explained herein, this algorithm is based on both the matrix partitioning approach of the QR decomposition, and the Gram-Schmidt orthogonalization algorithm. Then, the resulting orthogonalized gains are used in the derivation of a new algorithm, which is implemented off-line in order to ensure the optimization of successive codebooks in embedded multi-stage celp/VSELP coding. Finally, subjective test results are presented, which illustrate that 24 and 32 kbit/s embedded celp/VSELP wideband coders provide speech quality close to that of the embedded SB/ADPCM G722 coders at 56 and 64 kbit/s.
This paper reports a multispectral code excited linear prediction (Mcelp) method for the compression of multispectral images, Different linear prediction models and adaptation schemes have been compared, The method th...
详细信息
This paper reports a multispectral code excited linear prediction (Mcelp) method for the compression of multispectral images, Different linear prediction models and adaptation schemes have been compared, The method that uses a forward adaptive autoregressive (AR) model has proven to achieve a good compromise between performance, complexity, and robustness. This approach is referred to as the MFcelp method, Given a set of multispectral images, the linear predictive coefficients are updated over nonoverlapping three dimensional (3-D) macroblocks. Each macroblock is further divided into several 3-D micro-blocks, and the best excitation signal for each microblock is determined through an analysis-by-synthesis procedure. The MFcelp method has been applied to multispectral magnetic resonance (MR) images, To satisfy the high quality requirement for medical. images, the error between the original image set and the synthesized one is further specified using a vector quantizer, This method has been applied to images from 26 clinical MR neuro studies (20 slices/study, three spectral bands/slice, 256 x 256 pixels/band, 12 b/pixel). The MFcelp method provides a significant visual improvement over the discrete cosine transform (DCT) based Joint Photographers Expert Group (JPEG) method, the wavelet transform based embedded zero-tree wavelet (EZW) coding method, and the vector tree (VT) coding method, as well as the multispectral segmented autoregressive moving average (MSARMA) method we developed previously.
This paper describes modifications to a previously proposed 8-kb/s 4-ms-delay celp speech coding algorithm with a view to improving the speech quality while maintaining low delay and only moderately increasing complex...
详细信息
This paper describes modifications to a previously proposed 8-kb/s 4-ms-delay celp speech coding algorithm with a view to improving the speech quality while maintaining low delay and only moderately increasing complexity. The modifications are intended to improve the effectiveness of interframe pitch lag prediction and the sub-optimality level of the excitation coding to the backward adapted synthesis filter by using delayed decision and joint optimization techniques. Results of subjective listening tests using Japanese speech indicate that the coded speech quality is significantly superior to that of the 8-kb/s VSELP coder which has a 20-ms delay. A method that reduces the computational complexity of closed-loop 3-tap pitch prediction with no perceptible degradation in speech quality is proposed, based on representing the pitch-tap vector as the product of a scalar pitch gain and a normalized shape codevector.
SNR scalable speech coding is desirable for a number of network multimedia applications, but relatively few SNR-scalable speech coders exist for operation at rates below 16 kb/s. We investigate several SNR scalable so...
详细信息
SNR scalable speech coding is desirable for a number of network multimedia applications, but relatively few SNR-scalable speech coders exist for operation at rates below 16 kb/s. We investigate several SNR scalable source coding structures and define the new concepts of dependent and independent SNR scalability, where independent SNR scalable coders depend on the core layer coder only through the core layer output. Independent SNR scalable structures offer the possibility of providing bit rate scalable functionality to existing nonscalable coders and standards. We show that the MPEG-4 scalable coders are examples of dependent SNR scalable coders, and we introduce a new independent SNR scalable coder called celpTree, which has the additional advantage of being low delay. We compare the performance of the MPEG-4 coders and celpTree for both clean and noisy speech, and we examine the effects of frequency-weighted distortion measures in the enhancement layers of SNR scalable speech coders.
A comprehensive performance analysis of sinusoidal and code excited linear prediction (celp) speech coding is given around 4 kbit/s, using both subjective and objective measurements. Based on the observations made, ju...
详细信息
A comprehensive performance analysis of sinusoidal and code excited linear prediction (celp) speech coding is given around 4 kbit/s, using both subjective and objective measurements. Based on the observations made, justification for the multi-modal hybrid coding approach employing both sinusoidal and celp coding is given, and an implementation of such a coder is described. This 4 kbit/s sinusoidal/celp speech coder utilizes four modes to classify the input speech segment: voiced, jittery-voiced, plosive and unvoiced. For voiced segments sinusoidal coding is used whereas different celp versions are employed for the other modes. The quality of the implemented 4 kbit/s sinusoidal/ celp speech coder in clean speech conditions is finally verified by a listening test. In the test, the 4 kbit/s coder performed almost as well as the high-quality references used, but it still needs improvements to be classified as a high-quality 4 kbit/s speech coder. (C) 2003 Elsevier B.V. All rights reserved.
A multilayer perceptron (MLP) is applied as a time domain noalinear filter to two classes of degraded speech, namely gaussian white noise and nonlinear system degradation introduced by a low bit-rate celp coder. The g...
详细信息
A multilayer perceptron (MLP) is applied as a time domain noalinear filter to two classes of degraded speech, namely gaussian white noise and nonlinear system degradation introduced by a low bit-rate celp coder. The goal of the study is to examine the influence of the inherent nonlinearity within the MLP, and this is achieved by varying the levels of nonlinearity within the structure. Direct comparisons of MLPs and linear filters show that with celp degradation the SNR improvements achieved by the MLP is measurably better than with an equivalent linear structure (3dB cf 1.5 dB) but when the degradation is additive noise the two structures perform equally well, The study highlights the importance of scaling to achieve optimum performance, and of matching the enhancer to the degradation.
A multi-layer perceptron (MLP) acting directly in the time-domain is applied as a speech signal enhancer, and the performance examined in the context of three common classes of degradation, namely low bit-race celp de...
详细信息
A multi-layer perceptron (MLP) acting directly in the time-domain is applied as a speech signal enhancer, and the performance examined in the context of three common classes of degradation, namely low bit-race celp degradation ie nonlinear system degradation, additive noise, and convolution by a linear system. The investigation focuses on two topics: (i) the influence of non-linearities within the network and (ii) network topology, comparing single and multiple output structures. The objective is to examine how these characteristics influence network performance and whether this depends on the class of degradation. Experimental results show the importance of matching the enhancer to the class of degradation. In the case of the celp coder the standard MLP with its inherently non-linear characteristics is shown to be consistently better than any equivalent linear structure (up to 3.2 dB compared with 1.6 dB SNR improvement). In contrast, when the degradation is from additive noise, a linear enhancer is always superior.
This paper proposes a 6.4-kbit/s extension to G.729 (conjugate structure algebraic code excited linear prediction: CS-Acelp). Each G.729 module was investigated to determine which bits could be removed without hurting...
详细信息
This paper proposes a 6.4-kbit/s extension to G.729 (conjugate structure algebraic code excited linear prediction: CS-Acelp). Each G.729 module was investigated to determine which bits could be removed without hurting the speech quality, then two coders that have different bit allocations were designed. They have two different algebraic codebooks (a 10-bit algebraic codebook that has two pulses and an 11-bit algebraic codebook that has two or three pulses). This paper also proposes a conditional orthogonalized search for a fixed codebook to improve the speech quality. The conditional orthogonalized search chooses one of two search methods (orthogonalized or non-orthogonalized) based on the optimum pitch gain. The quality of the two coders was evaluated using objective measurements (SNR and segmental SNR) and subjective ones (mean opinion score: MOS and a pair-comparison test). The selected coder was evaluated under practical conditions. Subjective test results have indicated that the quality of the proposed coder (10-ms frame length) is equivalent to that of the 6.3-kbit/s G.723.1 coder, which has a 30-ms frame length.
暂无评论