In this work, we propose an optimization scheme based on a multi-objective Genetic Algorithm (GA) for the design of orthogonal filter banks for speech compression. A parameterization is adopted to assure that the resu...
详细信息
In this work, we propose an optimization scheme based on a multi-objective Genetic Algorithm (GA) for the design of orthogonal filter banks for speech compression. A parameterization is adopted to assure that the resulting filter banks satisfy perfect reconstruction and have at least two vanishing moments. We search for a parameter set that optimizes the coding gain and the frequency selectivity. As the objectives are conflicting, we investigate the solution that realizes the best compromise between the objectives criteria using the Non-dominated Sorting Genetic Algorithm (NSGAIII). Experimental results have shown that the optimized filter banks provide a significant gain in coding performances when comparing with the Daubechies orthogonal filter banks for test speech signals.
High quality speech at low bit rates makes code excited linear prediction (CELP) the dominant choice for a narrowband coding technique despite the susceptibility to packet loss. One of the few techniques which receive...
详细信息
High quality speech at low bit rates makes code excited linear prediction (CELP) the dominant choice for a narrowband coding technique despite the susceptibility to packet loss. One of the few techniques which received attention after the introduction of CELP coding technique is the internet low bitrate codec (iLBC) because of inherent high robustness to packet loss. Addition of rate flexibility and scalability makes the iLBC an attractive choice for voice communication over IP networks. In this paper, performance improvement schemes of multi-rate iLBC and its scalable structure are proposed, and the proposed codec enhanced from the previous work is re-designed based on the subjective listening quality instead of the objective quality. In particular, perceptual weighting and the modified discrete cosine transform (MDCT) with short overlap in weighted signal domain are employed along with the improved packet loss concealment (PLC) algorithm. The subjective evaluation results show that the speech quality of the proposed codec is equivalent to that of state-of-the-art codec, G.718, under both a clean channel condition and lossy channel conditions. This result is significant considering that development of the proposed codec is still in early stage.
This paper describes a new artificial speech signal (ASVQ: Artificial speech by Vector Quantization technique) which reflects the average characteristics of the human voice. The ASVQ is intended for use as a test sign...
详细信息
This paper describes a new artificial speech signal (ASVQ: Artificial speech by Vector Quantization technique) which reflects the average characteristics of the human voice. The ASVQ is intended for use as a test signal in the objective evaluation of speech coding system quality. To obtain the average characteristics, a very large speech data base is analyzed. The ASVQ generation method which reflects the extracted average characteristics of the human voice is formulated. This method applies vector quantizing analysis to the speech data base. The LPC speech synthesis circuit is used to reproduce the average characteristics. Finally, the new artificial speech signal is compared with a human voice and the estimation accuracy of the subjective quality of speech coding systems and nonlinear distortions is evaluated.
In this article, we have reviewed some of the existing subjective and objective measures used in the area of speech coding. The mean opinion score and the diagnostic acceptability measure are two of the widely used su...
详细信息
In this article, we have reviewed some of the existing subjective and objective measures used in the area of speech coding. The mean opinion score and the diagnostic acceptability measure are two of the widely used subjective measures. The most popular class of the time-domain measures is the signal-to-noise ratio (SNR) with its variants such as the segmental SNR, the granular segmentsal SNR etc. Among the spectral distortion measures, the log likelihood ratio measure, the lag area ratio measure, the log spectral distortion measure, the cepstral distance and the Itakura-Saito distortion measure are quite well-known. Some of the more recently proposed objective measures place emphasis on the perceptually significant aspects. Three such classes of the psychoacoustically-motivated measures are the information index, the Bark spectral distortion measure and the neural distance measure (e.g., the cochlear discrimination information, the cochlear hidden Markovian measues). The merit of considering important perceptual events is evident in the success of these measures.
The results of an extensive investigation of the properties of 64-point Hadamard transformed speech are presented. Detailed information is given about the probability density functions of the Hadamard coefficients, th...
详细信息
The results of an extensive investigation of the properties of 64-point Hadamard transformed speech are presented. Detailed information is given about the probability density functions of the Hadamard coefficients, the average power-density spectrum in the Hadamard domain and the logical-autocorrelation function. The results indicate that good-quality speech can be reconstructed from 6 to 8 dominant Hadamard coefficients, but that the use of fewer coefficients is unlikely to lead to the reconstruction of speech of acceptable quality. The results of a preliminary series of listening tests are presented and these confirm conclusions drawn from the statistical properties of the transformed speech. It is shown that the number of bits needed for coefficient labelling constitutes a significant proportion of the total number of bits needed to represent Hadamard transformed speech. A technique is presented for reducing by more than 50% the number of labelling bits needed, and it is explained how, by using this technique, it should be possible to obtain good quality speech when using a transmission bit rate of 8 k bits/s.
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is...
详细信息
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages;furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.
In speech coding, denoising of the speech signal is essential as well as crucial. The filters for minimizing errors through denoising employ the autoregressive moving average (ARMA) approach, introducing higher comput...
详细信息
In speech coding, denoising of the speech signal is essential as well as crucial. The filters for minimizing errors through denoising employ the autoregressive moving average (ARMA) approach, introducing higher computational complexity in speech coder design. This research work presents the design and implementation of an effective perceptual weighting filter (PWF) for speech coding. The high-level synthesis of the fixed-point PWF filter is optimized by multiple optimization techniques along with detailed design space exploration using the weighted sum (WS) method. To enhance the performance, an FPGA-based hardware accelerator is proposed using hardware/software (HW/SW) co-design in an embedded environment. Simulative analysis in Vivado HLS and final accelerator design in the Vitis IDE tool validate the proposed architecture by using real-time speech samples, demonstrating a 50% reduction in area and a 99% execution improvement. This makes it well-suited for use in modern speech codecs, enhancing the efficiency.
A new adaptive quantizer which uses a combination of instantaneous and syllabic adaptation is presented for use in speech codecs. It can be designed to adapt to changes in the mean, variance, and pdf shape of its inpu...
详细信息
A new adaptive quantizer which uses a combination of instantaneous and syllabic adaptation is presented for use in speech codecs. It can be designed to adapt to changes in the mean, variance, and pdf shape of its input signal, and to quantize the signal using one or more bits/sample. It is therefore called the generalized hybrid adaptive quantizer (GHAQ). An efficient procedure for optimizing the GHAQ using a training sequence of signal samples is described, and the effects on the performance of the GHAQ of varying the memory length and the syllabic compandor time constant are investigated. It is found that an optimized version of the two-bit GHAQ offers improved signal-to-noise ratio over Jayant's adaptive quantizer with a one-word memory when it is used in a predictive speech codec with a zero-, first-, or second-order fixed predictor.< >
A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization is presented. In this vocoder, the speech spectral-parameter sequence is represented as the concatenation of variable-...
详细信息
A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization is presented. In this vocoder, the speech spectral-parameter sequence is represented as the concatenation of variable-length spectral segments generated by linearly time-warping fixed-length code segments. Both the sequence of code segments and the segment lengths are efficiently determined using a dynamic programming procedure. This procedure minimizes the spectral distance measured between the original and the coded spectral sequence in a given interval. An iterative algorithm is developed for designing fixed-length code segments for the training spectral sequence. It updates the segment boundaries of the training spectral sequence using an a priori codebook and updates the codebook using these segment sequences. The convergence of this algorithm is discussed theoretically and experimentally. In experiments, the performance of variable-length segment quantization for voice coding is compared to that of fixed-length segment quantization and vector quantization.< >
Tree coding is combined with time domain harmonic scaling (TDHS) for speech coding at 6.4 and 4.8 kbps. In order to improve the robustness to channel errors, new pitch predictor, short-term predictor adaptation and ga...
详细信息
Tree coding is combined with time domain harmonic scaling (TDHS) for speech coding at 6.4 and 4.8 kbps. In order to improve the robustness to channel errors, new pitch predictor, short-term predictor adaptation and gain adaptation methods are proposed for tree coder. New code trees with appropriate gain adaptation rules, new backward adaptive pitch predictor and robust short-term predictor adaptation algorithms are evaluated for both ideal and noisy channels. Paired comparison listening tests show that the 6.4 kbps coder (2-to-1 TDHS/2 bits/samples tree coding) has speech quality equivalent to 6 bit log-PCM at a sampling rate of 6400 samples/s. (C) 1999 Elsevier Science B.V. All rights reserved.
暂无评论