An algorithm for designing linear prediction-based two-channel multiple-description predictive-vector quantizers;(MD-PVQs) for packet-loss channels is presented. This algorithm iteratively improves the encoder partiti...
详细信息
An algorithm for designing linear prediction-based two-channel multiple-description predictive-vector quantizers;(MD-PVQs) for packet-loss channels is presented. This algorithm iteratively improves the encoder partition, the set of multiple description codebooks, and the linear predictor for a given channel loss probability, based on a training set of source data. The effectiveness of the designs obtained with the given algorithm is demonstrated using a waveform coding example involving a Markov source as well as vector quantization of speech line' spectral pairs.
The general structure of this class of coders is reviewed, and the particulars of its members are discussed. The different analysis procedures are described, and the contributions of the various coder parameters to th...
详细信息
The general structure of this class of coders is reviewed, and the particulars of its members are discussed. The different analysis procedures are described, and the contributions of the various coder parameters to the performance of the coder are examined. Quantization procedures for each transmitted parameter are given along with examples of bit allocations. The speech quality produced by these coders is high at 16 kb/s and good at 8 kb/s, but only fair at 4.8 kb/s. The use of postprocessing techniques changes the performance at lower rates, but more research is needed to further improve the coders.< >
An educational software tool on speech coding is presented. Portions of this program are used in our senior-level DSP (digital signal processing) class at Arizona State University to expose undergraduate students to s...
详细信息
An educational software tool on speech coding is presented. Portions of this program are used in our senior-level DSP (digital signal processing) class at Arizona State University to expose undergraduate students to speech coding and present speed analysis/synthesis as an application paradigm for many DSP fundamental concepts. The simulation software provides an interactive environment that allows users to investigate and understand speech coding algorithms for a variety of input speech records. Time- and frequency-domain representations of input and reconstructed speech can be graphically displayed and played back on a PC equipped with a standard 16-bit sound card. The program has been developed for use in the MATLAB environment and includes implementations of the FS-1015 LPC-10e. the FS-1016 CELP, the ETSI GSM, the IS-54 VSELP, the G.721 ADPCM, and the G.728 LD-CELP speech coding algorithms, integrated under a common graphical interface.
In this paper we present first experimental results with a novel audio coding technique based on approximating Hilbert envelopes of relatively long segments of audio signal in critical-band-sized sub-bands by autoregr...
详细信息
ISBN:
(纸本)3540390901
In this paper we present first experimental results with a novel audio coding technique based on approximating Hilbert envelopes of relatively long segments of audio signal in critical-band-sized sub-bands by autoregressive model. We exploit the generalized autocorrelation linear predictive technique that allows for a better control of fitting the peaks and troughs of the envelope in the sub-band. Despite introducing longer algorithmic delay, improved coding efficiency is achieved. Since the described technique does not directly model short-term spectral envelopes of the signal, it is suitable not only for codingspeech but also for coding of other audio signals.
coding of speech signals using Bessel functions as orthogonal signals in the Fourier-Bessel (FB) expansion has been explored. It has been found that a reasonable quality of speech can be reconstructed using a set of 1...
详细信息
ISBN:
(纸本)0780371089
coding of speech signals using Bessel functions as orthogonal signals in the Fourier-Bessel (FB) expansion has been explored. It has been found that a reasonable quality of speech can be reconstructed using a set of 15 to 30 coefficients in the FB expansion of each frame of speech. At 80 frames per second and eight bits per coefficient, this corresponds to a bit rate of as low as 9600 bits/second when predetermined sequence of coefficients are used. The speech quality and the bit rate increase when higher number or a selected set of coefficients are used. Comparable results in perceptual speech quality and frame-to-frame signal-to-noise were observed for both male and female speakers.
Simulation results are presented which compare the performance of all-pole, all-zero, and pole-zero predictors in ADPCM at data rates of 16 and 32 kbits/s over both ideal and noisy channels. Separate backward adaptive...
详细信息
Simulation results are presented which compare the performance of all-pole, all-zero, and pole-zero predictors in ADPCM at data rates of 16 and 32 kbits/s over both ideal and noisy channels. Separate backward adaptive gradient algorithms are used to adapt the poles and the zeros independently. The performance indicators used are signal-to-quantization noise ratio (SNR), signal-to-prediction error ratio (SPER), segmental SNR (SNRSEG), and subjective listening tests. For speech sources, the all-zero and pole-zero predictors produce SNR and SNRSEG values that are approximately 1-3 dB higher than those generated by the all-pole predictor. Subjective listening tests reveal that an eighth-order all-zero predictor performs as well or better than an allpole predictor for all conditions studied.
As a novel technique for IP and 3G mobile integrated networks, we propose a speech coding translation scheme that can improve the speech quality of intercommunication among different speech coding standards, such as G...
详细信息
ISBN:
(纸本)0780374002
As a novel technique for IP and 3G mobile integrated networks, we propose a speech coding translation scheme that can improve the speech quality of intercommunication among different speech coding standards, such as G.729 of VoIP and AMR of IMT-2000. Through the direct conversion of transmission parameters, this scheme improves speech intelligibility with reduced processing delay compared with conventional schemes, which consist of auxiliary decoding and asynchronous re-encoding. Subjective test by 40 non-experts shows that the proposed scheme significantly improves the subjective speech quality by probability of 99%. And processing delay in one-way transmission is reduced by 25 ms.
Alias-and-Separate (AaS) speech coding framework has shown the possibility to encode wideband (WB) speech with a narrowband (NB) speech codec and reconstruct it using speech separation. WB speech is first decimated in...
详细信息
Alias-and-Separate (AaS) speech coding framework has shown the possibility to encode wideband (WB) speech with a narrowband (NB) speech codec and reconstruct it using speech separation. WB speech is first decimated incurring aliasing and then coded, transmitted, and decoded with a NB codec. The decoded signal is then separated into lower band and spectrally-flipped high band using a speech separation module, which are expanded, lowpass/highpass filtered, and added together to reconstruct the WB speech. The original AaS system, however, has algorithmic delay originated from the overlap-add operation for consecutive segments. This algorithmic delay can be reduced by omitting the overlap-add procedure, but the quality of the reconstructed speech is also degraded due to artifacts on the segment boundaries. In this work, we propose an improved AaS framework with minimum algorithmic delay. The decoded signal is first expanded by inserting zeros in-between samples before being processed by source separation module. As the expanded signal can be viewed as a summation of the frequency-shifted versions of the original signal, the decoded-and-expanded signal is then separated into the frequency-shifted signals, which are multiplied by complex exponentials and summed up to reconstruct the original signal. With carefully designed transposed convolution operation in the separation module, the proposed system requires minimal algorithmic delay while preventing discontinuity at the segment boundaries. Additionally, we propose to employ a generative vocoder to further improve the perceived quality and a modified multi-resolution short-time Fourier transform (MR-STFT) loss. Experimental results on the WB speech coding with a NB codec demonstrated that the proposed system outperformed the original AaS system and the existing WB speech codec in the subjective listening test. We have also shown that the proposed method can be applied when the decimation factor is not 2 in the experiment
Trellis coded vector quantization (TCVQ) and code-excited linear prediction (CELP) coding are combined to form an efficient low-bit rate speech coding system. The resulting system uses a trellis search to select the s...
详细信息
Trellis coded vector quantization (TCVQ) and code-excited linear prediction (CELP) coding are combined to form an efficient low-bit rate speech coding system. The resulting system uses a trellis search to select the synthesis filter excitation sequence, and is referred to as trellis excitation coding (TEC). Simulations are performed for encoding rates of 6.4 and 8 kbps. Informal listening tests indicate that the 8 kbps TEC system has quality roughly between that of 6-bit and 7-bit mu-law PCM with mu = 255. The 6.4 kbps TEC system provides speech quality between 5-bit and 6-bit mu-law PCM. A subjective comparison with vector sum excited linear prediction (VSELP) indicates that the 8 kbps TEC and the VSELP reconstructed speech is about equally preferable.
In this paper, we consider vector quantization of excitation gains in code-excited linear predictive (CELP) speech coder using the average error in reconstruction of the excitation signal as the distortion measure and...
详细信息
In this paper, we consider vector quantization of excitation gains in code-excited linear predictive (CELP) speech coder using the average error in reconstruction of the excitation signal as the distortion measure and use the same measure to design the codebooks. We have derived a generalized Lloyd's algorithm (GLA) to design a codebook for quantization so that the average of the above criterion over the training vectors is minimized. We have also derived an algorithm, referred to as the Genetic GLA (GGLA), that can be shown to converge to the global optimum of the associated functional with probability one. The performance of ACELP using the codebooks obtained by the proposed algorithms is compared with that of the conjugate-structured ACELP-based ITU-T G.729 coder. Qualitative and quantitative comparisons show that their qualities are comparable. (C) 2001 Elsevier Science B.V. All rights reserved.
暂无评论