作者:
SILVERMAN, HFDIXON, NRIBM CORP
THOMAS J WATSON RES CTR DEPT COMP SCI SPEECH PROCESSING GRP YORKTOWN HEIGHTS NY 10598 USA
The parametrically controlled analyzer (PCA) is a large PL/I program which has been designed to perform spectral analysis of speech signals. PCA features parametric selection of several analysis methods, including dis...
详细信息
The parametrically controlled analyzer (PCA) is a large PL/I program which has been designed to perform spectral analysis of speech signals. PCA features parametric selection of several analysis methods, including discrete Fourier transformation and linear predictive coding. Also, selection may be made among various smoothing, normalization, and interpolation methods. PCA develops high-quality spectrographic representations of speech for standard line printers and CRT displays. The PCA is described and numerous examples of various parameter settings are presented and discussed.
In this paper, we address the use of a vector quantizer (VQ) defined as a linear mapping of a block code (LMBC-VQ) in robust LPC spectrum quantization. Split VQ and multlstage VQ of LSF's are considered, We demons...
详细信息
In this paper, we address the use of a vector quantizer (VQ) defined as a linear mapping of a block code (LMBC-VQ) in robust LPC spectrum quantization. Split VQ and multlstage VQ of LSF's are considered, We demonstrate several properties for the LMBC-VQ including design methods and channel robustness theory. The LMBC-VQ codebooks are, guided by the robustness theory, directly designed with inherent robustness against channel errors. We demonstrate results where the resulting LMBC-VQ's outperform conventional codebooks designed with the generalized Lloyd algorithm in the presence of channel errors even when postprocessing index assignment is applied to the latter. Listening tests confirmed that the increased robustness is significant for the perceptual quality. We show that short block codes can be used to build the codebooks with no or small losses in quantization performance. This is advantageous for channel error robustness, lower memory requirement, and lower complexity. We also describe the structure of split VQ and multistage VQ using the LMBC framework. This allows us to assess the structural constraints and to generalize these VQ schemes, resulting in improved performance.
Sukkar et al. defined the function zinc (t) = A sinc (t) + B cosc (t). It is pointed out that the terms sinc (t) and cosc (t) are forming a pair of Hilbert transforms.
Sukkar et al. defined the function zinc (t) = A sinc (t) + B cosc (t). It is pointed out that the terms sinc (t) and cosc (t) are forming a pair of Hilbert transforms.
An efficient method to implement the perceptual posterfilter for the suppression of coding noise in CELP-coded speech is proposed. The method is based on approximating the response of an all-pole filter to the respons...
详细信息
An efficient method to implement the perceptual posterfilter for the suppression of coding noise in CELP-coded speech is proposed. The method is based on approximating the response of an all-pole filter to the response of the pole-zero form postfilter via cepstrum processing. This all-pole postfilter can then be implemented more efficiently than the pole-zero postfilter with less computation and filter memory.
The authors show that for small errors in an LPC model, spectral distortion and weighted squared distortion measures in the coefficients of arbitrary one-to-one transformations of the LPC parameters are equivalent. Fo...
详细信息
The authors show that for small errors in an LPC model, spectral distortion and weighted squared distortion measures in the coefficients of arbitrary one-to-one transformations of the LPC parameters are equivalent. For line spectrum pairs and immitance spectrum pairs, the weighting matrix is diagonal, which is advantageous for efficient quantisation.
A novel split vector quantization (SVQ) scheme for low bit rate coding of speech signals is proposed. In this scheme, the LPC parameter vector, which is represented by PARCOR coefficients, is split into small-dimensio...
详细信息
A novel split vector quantization (SVQ) scheme for low bit rate coding of speech signals is proposed. In this scheme, the LPC parameter vector, which is represented by PARCOR coefficients, is split into small-dimension subvectors, and each subvector is sequentially quantized according to a multistage structure that resembles a segmented lattice filter. The forward and backward prediction residuals in the segmented filter are coupled across VQ stages. The quantizer in each stage operates on the principle of minimizing the forward and backward prediction error energies similar to linearpredictive analysis. Simulation results show that the new split VQ scheme can achieve transparent quantization of LPC parameters at 25 b/frame.
A novel scheme of generating the codebook for vector quantisation is presented. With the initial codebook resulting from a K-d tree splitting procedure based on the greatest coordinate variance, a proposed partial GLA...
详细信息
A novel scheme of generating the codebook for vector quantisation is presented. With the initial codebook resulting from a K-d tree splitting procedure based on the greatest coordinate variance, a proposed partial GLA is used to improve the codevectors. The performance of the VQ so obtained is superior to those of the VQ designed by the standard GLA with the same initialisation and the splitting-initialised LEG algorithm. However, the improvement in performance is accompanied by an increase in the computational complexity involved in the designed stage.
An algorithm for LPC parameter optimization in MP-LPC-based speech coders is presented. It is shown that by taking into account the nature of the MP-excitation signal into LPC parameter computation, it is possible to ...
详细信息
An algorithm for LPC parameter optimization in MP-LPC-based speech coders is presented. It is shown that by taking into account the nature of the MP-excitation signal into LPC parameter computation, it is possible to improve the effectiveness of the LPC model. This results in a better quality of the reconstructed signal in terms of both objective and subjective criteria. The implementation details of the algorithm are discussed and experimental results are presented. In particular, a comparison with standard MP-LPC techniques is given.
This paper addresses a novel approach to investigate, study and simulate computation of high band (HB) feature extraction based on linear predictive coding (LPC) and mel frequency cepstral coefficient (MFCC) technique...
详细信息
This paper addresses a novel approach to investigate, study and simulate computation of high band (HB) feature extraction based on linear predictive coding (LPC) and mel frequency cepstral coefficient (MFCC) techniques. Further, HB features are embedded into encoded bitstream of proposed global system for mobile (GSM) full rate (FR) 06.10 coder using joint source coding and data hiding before being transmitted to receiving terminal. At receiver, HB features are extracted to reproduce HB portion of speech and for the same different extension of excitation techniques are applied and their results evaluated in terms of quality (intelligibility and naturalness) and bandwidth. MATLAB based e-test bench is created for implementing the proposed artificial bandwidth extension (ABE) coder following series of simulations, that are carried out to discover and gain insight about the performance of it using subjective [mean opinion score (MOS)] and objective [perceptual evaluation of speech quality (PESQ)] analysis. The results obtained for both the analyses advocate that proposed ABE coder outperforms proposed GSM FR NB (legacy GSM FR) coder. While the fact remains that, compared to LPC based parameterizations over ABE coder, MFCC parameterization results in higher speech intelligibility which is evident from obtained slightly better PESQ and MOS scores.
One of the most difficult problems in speech analysis is reliable discrimination among silence, unvoiced speech, and voiced speech which has been transmitted over a telephone line. Although several methods have been p...
详细信息
One of the most difficult problems in speech analysis is reliable discrimination among silence, unvoiced speech, and voiced speech which has been transmitted over a telephone line. Although several methods have been proposed for making this three-level decision, these schemes have met with only modest success. In this paper, a novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the three classes of signal is obtained during a training session, and an LPC distance measure and an energy distance are nonlinearly combined to make the final discrimination. This algorithm has been tested over conventional switched telephone lines, across a variety of speakers, and has been found to have an error rate of about 5 percent, with the majority of the errors (about \frac{2}{3} ) occurring at the boundaries between signal classes. The algorithm is currently being used in a speaker-independent word recognition system.
暂无评论