A maximum likelihood estimation procedure is constructed for estimating the parameters of discrete fractionally differenced Gaussian noise from an observation set of finite size N. The procedure does not involve the c...
详细信息
A maximum likelihood estimation procedure is constructed for estimating the parameters of discrete fractionally differenced Gaussian noise from an observation set of finite size N. The procedure does not involve the computation of any matrix inverse or determinant. It requires N2/2 + O(N) operations. The expected value of the loglikelihood function for estimating the parameter d of fractionally differenced Gaussian noise (which corresponds to a parameter of the equivalent continuous-time fractional Brownian motion related to its fractal dimension) is shown to have a unique maximum with the range of allowable values of d. The maximum occurs at the true value of d. A Cramer-Rao bound on the variance of any unbiased estimate of d obtained from a finite size observation set is derived. It is shown experimentally that the maximum likelihood estimate of d is unbiased and efficient when finite size data sets are used in the estimation procedure. The proposed procedure is also extended to deal with noisy observations of discrete fractionally differenced Gaussian noise.
A feature set that captures the dynamics of formant transitions prior to closure in a VCV environment is used to characterize and classify the unvoiced stop consonants. The feature set is derived from a time-varying, ...
详细信息
A feature set that captures the dynamics of formant transitions prior to closure in a VCV environment is used to characterize and classify the unvoiced stop consonants. The feature set is derived from a time-varying, data-selective model for the speech signal. Its performance is compared with that of comparable formant data from a standard delta-LPC-based madel. The different feature sets are evaluated on a database composed of eight talkers. A 40% reduction in classification error rate is obtained by means of the time-varying model. The performance of three different classifiers is discussed. A novel adaptive algorithm, termed learning vector classifier (LVC) is compared with standard K-means and LVQ2 classifiers. LVC is a supervised learning classifier that improves performance by increasing the resolution of the decision boundaries. Error rates obtained for the three-way (p, t, and k) classification task using LVC and the time-varying analysis are comparable to that of techniques that make use of additional discriminating information contained in the burst. Further improvements are expected when an expanded time-varying feature set is utilized, coupled with information from the burst.
A common technique to extend linear prediction to nonstationary signals is time segmentation: the signal is split into small portions and the modelization is carried out locally, The accuracy of the analysis is, howev...
详细信息
A common technique to extend linear prediction to nonstationary signals is time segmentation: the signal is split into small portions and the modelization is carried out locally, The accuracy of the analysis is, however dependent on the window size and on the signal characteristics, so that the problem of finding a good segmentation is crucial to the entire modeling scheme, In this paper, we will present an algorithm which determines the optimal segmentation with respect to a cost function relating prediction error to modeling cost. The proposed approach casts the problem in a rate/distortion (R/D) framework, whereby the segmentation is implicitly computed while minimizing the modelization distortion for a given modelization cost. The algorithm is implemented by means of dynamic programming and takes the form of a trellis-based Lagrangian minimization, The optimal linear predictor, when applied to speech coding, dramatically reduces the number of bits per second devoted to the modeling parameters in comparison to fixed-window schemes.
In this paper, an on-line signature verification scheme based on linear Prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing...
详细信息
In this paper, an on-line signature verification scheme based on linear Prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing trajectories are calculated as the features of the signatures. These coefficients are used as inputs to the neural networks. A number of single-output multilayer perceptrons (MLP's), as many as the number of words in the signature, are equipped for each registered person to verify the input signature. If the summation of output values of all MLP's is larger than verification threshold, the input signature is regarded as a genuine signature;otherwise, the input signature is a forgery. Simulations show that this scheme can detect the genuineness of the input signatures from our test database with an error rate as low as 4%.
This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associa...
详细信息
This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated in an extensive series of speaker-independent isolated digit (one-nine, oh and zero) recognition experiments using a 225-talker. multidialect database developed by Texas Instruments (TI). The best recognizer configurations achieved accuracies of 97-99 percent on the TI database.
Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation wi...
详细信息
Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation with optimally adjusted amplitudes. The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity. The authors find that speech quality depends on the pulse rate. They also find that for the same quality, female speech requires a higher pulse rate than male speech. The pitch dependence can be reduced and speech quality improved for high-pitched speakers by incorporating long delay prediction in the multipulse model.< >
We present a predictive neural network called neural predictivecoding (NPC). This model is used for nonlinear discriminant features extraction applied to phoneme recognition. We validate the nonlinear prediction impr...
详细信息
We present a predictive neural network called neural predictivecoding (NPC). This model is used for nonlinear discriminant features extraction applied to phoneme recognition. We validate the nonlinear prediction improvement of the NPC model. We also, present a new extension of the NPC model: NPC-3. In order to evaluate the performances of the NPC-3 model, we carried out a study of Darpa-Timit phonemes (in particular /b/, /d/, /g/ and /p/, /t/, /q/ phonemes) recognition. Comparisons with traditional coding methods are presented. We also show how an adaptative constraint allows improvements on the recognition task.
Vector predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codeb...
详细信息
Vector predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.
This paper presents a novel wideband speech coding algorithm called transform predictivecoding (TPC). The main emphasis is on low complexity. TPC uses short-term and long-term prediction to remove the redundancy in s...
详细信息
ISBN:
(纸本)0780331923
This paper presents a novel wideband speech coding algorithm called transform predictivecoding (TPC). The main emphasis is on low complexity. TPC uses short-term and long-term prediction to remove the redundancy in speech. The prediction residual is quantized in the frequency domain based on a calculated noise masking threshold. In its simplest form, the TPC coder uses only open-loop quantization and therefore has a low complexity. A 16 kb/s full-duplex, open-loop TPC coder takes only 22% of the CPU load on a 150 MHz SGI Indy workstation and about 34% on a 90 MHz Pentium PC. The speech quality of TPC is almost transparent at 32 kb/s, very good at 24 kb/s, and acceptable at 16 kb/s. In the second half of the paper, we report our recent progress in using closed-loop quantization techniques to improve TPC output speech quality.
暂无评论