In an effort to provide a more efficient representation of the acoustical speech signal in the pre classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of R.R. C...
详细信息
In an effort to provide a more efficient representation of the acoustical speech signal in the pre classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of R.R. Coifman and M.L. Wickerhauser (1992). This combines the advantages of using a smooth, compactly supported wavelet basis with an adaptive time scale analysis, dependent on the problem at hand. We start by briefly reviewing areas within speech recognition where the wavelet transform has been applied with some success. Examples include pitch detection, formant tracking, phoneme classification. Finally, our wavelet based feature extraction system is described and its performance on a simple phonetic classification problem given.
Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low dat...
详细信息
Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low data rates (800-2400 b/s), but they often sound synthetic and generate annoying artifacts such as buzzes, thumps, and tonal noises. These problems increase dramatically if acoustic background noise is present at the speech input. This paper presents a new mixed excitation LPC vocoder model that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech. The new model also eliminates the traditional requirement for a binary voicing decision so that the vocoder performs well even in the presence of acoustic background noise. A 2400-b/s LPC vocoder based on this model has been developed and implemented in simulations and in a real-time system. Formal subjective testing of this coder confirms that it produces natural sounding speech even in a difficult noise environment. In fact, diagnostic acceptibility measure (DAM) test scores show that the performance of the 2400-b/s mixed excitation LPC vocoder is close to that of the government standard 4800-b/s CELP coder.
This correspondence presents a new two-stage adaptive vector quantizer of LSF parameters in LPC speech coding. The first codebook is adapted by a partition-delete operation, whereas the code-vectors of the second code...
详细信息
This correspondence presents a new two-stage adaptive vector quantizer of LSF parameters in LPC speech coding. The first codebook is adapted by a partition-delete operation, whereas the code-vectors of the second codebook remain unchanged. The objective and subjective evaluations show that the proposed scheme offers transparent quantization with 22 b/frame.
Parallel, self organizing, hierarchical neural networks (PSHNN's) are multistage networks in which stages operate in parallel rather than in series during testing, Each stage can be any particular type of network,...
详细信息
Parallel, self organizing, hierarchical neural networks (PSHNN's) are multistage networks in which stages operate in parallel rather than in series during testing, Each stage can be any particular type of network, Previous PSHNN's assume quantized, say, binary outputs, A new type of PSHNN is discussed such that the outputs are allowed to be continuous-valued. The performance of the resulting networks is tested in the problem of predicting speech signal samples from past samples, Three types of networks in which the stages are learned by the delta rule, sequential least-squares, and the backpropagation (BP) algorithm, respectively, are described, In all cases studied, the new networks achieve better performance than linear prediction, A revised BP algorithm is discussed for learning input nonlinearities. When the BP algorithm is to be used, better performance is achieved when a single;BP network is replaced by a PSHNN of equal complexity in which each stage is a BP network of smaller complexity than the single BP network.
This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In ...
详细信息
This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In voiced frames, instead of conventional random excitation vectors, PSI-CELP converts even the random excitation vectors to have pitch periodicity by repeating stored random vectors as well as by using an adaptive codebook. In silent, unvoiced, and transient frames, the coder stops using the adaptive codebook and switches to fixed random codebooks. The PSI-CELP coder also implements novel structures and techniques: an FIR-type perceptual weighting filter using unquantized LPC parameters, a random codebook with a conjugate structure trained to be robust against channel errors, codebook search with delayed decision, a gain quantization with sloped amplitude, and a moving average prediction coding of LSP parameters. Our speech coder is implemented by DSP chips. Its coded speech quality at 3.6 kb/s with 2.0 kb/s redundancy is comparable to that of the Japanese full-rate VSELP coder at 6.7 kb/s with 4.5 kb/s redundancy. The basic structure of this PSI-CELP coder has been chosen as the Japanese half-rate speech codec for digital cellular telecommunications.
This correspondence deal with spectral modeling in filter banks. It is shown, both theoretically and experimentally, that subspectral modeling is superior to full spectrum modeling if performed before the rate change....
详细信息
This correspondence deal with spectral modeling in filter banks. It is shown, both theoretically and experimentally, that subspectral modeling is superior to full spectrum modeling if performed before the rate change. The price paid for this performance improvement is an increase of computations. A few different signal sources were considered in this study. It is shown that the performance of AR and ARMA techniques are comparable in subspectral modeling. The first is desired because of its simplicity. As an application of this study, we implemented a CELP based speech codec embedded in a filter bank structure. We found that there were no performance improvements of subband CELP technique over the fullband case. The theoretical reasonings of the experimental results are also given in this correspondence.
We consider an algorithm for reduction of broadband noise in speech based on signal subspaces. The algorithm is formulated by means of the quotient singular value decomposition (QSVD). With this formulation, a prewhit...
详细信息
We consider an algorithm for reduction of broadband noise in speech based on signal subspaces. The algorithm is formulated by means of the quotient singular value decomposition (QSVD). With this formulation, a prewhitening operation becomes an integral part of the algorithm. We demonstrate that this is essential in connection with updating issues in real-time recursive applications. We also illustrate by examples that we are able to achieve a satisfactory quality of the reconstructed signal.
The authors present a linear prediction (LP) based vocoder in which speech waveforms are considered as having a time envelope, the shape of which contains important perceptual information. By ensuring that the time en...
详细信息
The authors present a linear prediction (LP) based vocoder in which speech waveforms are considered as having a time envelope, the shape of which contains important perceptual information. By ensuring that the time envelope of the synthetic speech closely matches that of the original, natural sounding synthetic speech can be achieved at 1.6kbit/s.
A novel codebook generation scheme for vector quantisation is presented. The proposed scheme is of comparable computational complexity to the Linde-Buzo-Gray (LBG) algorithm, but its performance is shown to be superior.
A novel codebook generation scheme for vector quantisation is presented. The proposed scheme is of comparable computational complexity to the Linde-Buzo-Gray (LBG) algorithm, but its performance is shown to be superior.
This paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing ...
详细信息
This paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems, First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a ''sensitivity matrix'' that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures, Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies, Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensivity matrix for the LSP frequencies is diagonal, implying that a WMSE measure of LSP frequencies converges to the LSD measure in high-rate VQ systems, Experimental results to support the theoretical performance estimates are provided.
暂无评论