This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associa...
详细信息
This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated in an extensive series of speaker-independent isolated digit (one-nine, oh and zero) recognition experiments using a 225-talker. multidialect database developed by Texas Instruments (TI). The best recognizer configurations achieved accuracies of 97-99 percent on the TI database.
In this paper, an on-line signature verification scheme based on linear Prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing...
详细信息
In this paper, an on-line signature verification scheme based on linear Prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing trajectories are calculated as the features of the signatures. These coefficients are used as inputs to the neural networks. A number of single-output multilayer perceptrons (MLP's), as many as the number of words in the signature, are equipped for each registered person to verify the input signature. If the summation of output values of all MLP's is larger than verification threshold, the input signature is regarded as a genuine signature;otherwise, the input signature is a forgery. Simulations show that this scheme can detect the genuineness of the input signatures from our test database with an error rate as low as 4%.
This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In ...
详细信息
This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In voiced frames, instead of conventional random excitation vectors, PSI-CELP converts even the random excitation vectors to have pitch periodicity by repeating stored random vectors as well as by using an adaptive codebook. In silent, unvoiced, and transient frames, the coder stops using the adaptive codebook and switches to fixed random codebooks. The PSI-CELP coder also implements novel structures and techniques: an FIR-type perceptual weighting filter using unquantized LPC parameters, a random codebook with a conjugate structure trained to be robust against channel errors, codebook search with delayed decision, a gain quantization with sloped amplitude, and a moving average prediction coding of LSP parameters. Our speech coder is implemented by DSP chips. Its coded speech quality at 3.6 kb/s with 2.0 kb/s redundancy is comparable to that of the Japanese full-rate VSELP coder at 6.7 kb/s with 4.5 kb/s redundancy. The basic structure of this PSI-CELP coder has been chosen as the Japanese half-rate speech codec for digital cellular telecommunications.
Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation wi...
详细信息
Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation with optimally adjusted amplitudes. The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity. The authors find that speech quality depends on the pulse rate. They also find that for the same quality, female speech requires a higher pulse rate than male speech. The pitch dependence can be reduced and speech quality improved for high-pitched speakers by incorporating long delay prediction in the multipulse model.< >
We present a predictive neural network called neural predictivecoding (NPC). This model is used for nonlinear discriminant features extraction applied to phoneme recognition. We validate the nonlinear prediction impr...
详细信息
We present a predictive neural network called neural predictivecoding (NPC). This model is used for nonlinear discriminant features extraction applied to phoneme recognition. We validate the nonlinear prediction improvement of the NPC model. We also, present a new extension of the NPC model: NPC-3. In order to evaluate the performances of the NPC-3 model, we carried out a study of Darpa-Timit phonemes (in particular /b/, /d/, /g/ and /p/, /t/, /q/ phonemes) recognition. Comparisons with traditional coding methods are presented. We also show how an adaptative constraint allows improvements on the recognition task.
Vector predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codeb...
详细信息
Vector predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.
This paper presents a novel wideband speech coding algorithm called transform predictivecoding (TPC). The main emphasis is on low complexity. TPC uses short-term and long-term prediction to remove the redundancy in s...
详细信息
ISBN:
(纸本)0780331923
This paper presents a novel wideband speech coding algorithm called transform predictivecoding (TPC). The main emphasis is on low complexity. TPC uses short-term and long-term prediction to remove the redundancy in speech. The prediction residual is quantized in the frequency domain based on a calculated noise masking threshold. In its simplest form, the TPC coder uses only open-loop quantization and therefore has a low complexity. A 16 kb/s full-duplex, open-loop TPC coder takes only 22% of the CPU load on a 150 MHz SGI Indy workstation and about 34% on a 90 MHz Pentium PC. The speech quality of TPC is almost transparent at 32 kb/s, very good at 24 kb/s, and acceptable at 16 kb/s. In the second half of the paper, we report our recent progress in using closed-loop quantization techniques to improve TPC output speech quality.
Current narrow-band speech coding algorithms (for transmission rates of 2400-4800 bits per second) typically excite linear filters with impulse trains to model voiced speech. The excitation function that would reprodu...
详细信息
Current narrow-band speech coding algorithms (for transmission rates of 2400-4800 bits per second) typically excite linear filters with impulse trains to model voiced speech. The excitation function that would reproduce the speech exactly is the prediction residual; however, the usual selection of filter coefficients does not produce the most pulse-like prediction residuals. Thus other choices for filters offer an opportunity to improve the quality of narrow-band coding. The strategy of this paper is to minimize a dynamically weighted prediction error, to allow the largest values of the prediction residuals to be unconstrained and thus make the residuals more pulse-like. The idea was tested on voiced speech with over 300 predictive models, which contained quadratic and cubic terms as well as linear. Models with fewer than eight terms were not enhanced. The idea worked well with other models, particularly those with 8 to 11 terms.
This paper presents efficient artificial neural network based linear predictive coding. Speech recognition is fundamentally pattern classification task. The objective is to input pattern, the speech and classify a seq...
详细信息
This paper presents efficient artificial neural network based linear predictive coding. Speech recognition is fundamentally pattern classification task. The objective is to input pattern, the speech and classify a sequence of patterns. The linearpredictive coefficients of the slowly varying speech signals are stored. A feedforward network is determined by linearpredictive coefficients. A three layered feedforward network was used with back propagation as the training algorithm. The network and learning techniques are proved for their correctness and applied to the problem of speech recognition. The suggested novel scheme yields results in finite accuracy and recognition performance.
暂无评论