Traditionally, linear Prediction is used to predict future values of a signal using past values. The goal is to minimize prediction errors. In this paper, we propose a novel method of utilizing prediction errors to ex...
详细信息
Traditionally, linear Prediction is used to predict future values of a signal using past values. The goal is to minimize prediction errors. In this paper, we propose a novel method of utilizing prediction errors to extract edges of images. In this method, smooth prediction errors are minimized while steep changes (larger errors) are amplified. Therefore, when applied to image edge detection, edge information can be accurately extracted. The proposed method is compared with predominant methods such as Sobel and Canny methods. While there is no mathematical proof that the proposed method outperforms predominant methods, however, examples presented in this paper may suggest that the proposed method may perform better for certain applications.
Here we consider the problem of providing near optimal performance for a large set of possible models. We adopt the LQR framework in the single-input single-output (SISO) setting, and prove that given a compact set of...
详细信息
Here we consider the problem of providing near optimal performance for a large set of possible models. We adopt the LQR framework in the single-input single-output (SISO) setting, and prove that given a compact set of controllable and observable plant models of a fixed order, we can construct a single linear periodic controller (LPC) which provides near optimal LQR performance.
This paper describes our enhanced mixed excitation linear prediction (MELP) speech coder which is a candidate for the new U.S. Federal Standard at 2.4 kbits/s. The new coder is based on the MELP model, and it uses a n...
详细信息
This paper describes our enhanced mixed excitation linear prediction (MELP) speech coder which is a candidate for the new U.S. Federal Standard at 2.4 kbits/s. The new coder is based on the MELP model, and it uses a number of enhancements as well as efficient quantization algorithms to improve performance while maintaining a low bit rate. In addition, the coder has been optimized for performance in acoustic background noise and in channel errors, as well as for efficient real-time implementation. Listening tests confirm that the enhanced 2.4 kbit/s MELP coder performs as well as the higher bit rate 4.8 kbit/s FS1016 CELP standard.
This paper presents the audio noise classification using Bark scale features and K-NN technique. This paper uses audio noise signal from NOISEX-92 (12 types). We determine the transfer functions from linearpredictive...
详细信息
This paper presents the audio noise classification using Bark scale features and K-NN technique. This paper uses audio noise signal from NOISEX-92 (12 types). We determine the transfer functions from linear predictive coding (LPC) coefficient of noise signal on Bark scale and use K-NN technique to classify them. The results will be used for optimization of speech recognition model in the presence of noise. The highest average accuracy for audio noise classification is obtained when K=3 and median over 5 consecutive frames.
We present a new speech coding algorithm, based on an all-pole model of the vocal tract. Whereas current autoregressive (AR) based modeling techniques (e.g. CELP, LPC-10) minimize a prediction error, which is consider...
详细信息
We present a new speech coding algorithm, based on an all-pole model of the vocal tract. Whereas current autoregressive (AR) based modeling techniques (e.g. CELP, LPC-10) minimize a prediction error, which is considered to be the input to the all-pole model, our approach determines the closest (in L/sub 2/ norm) signal, which exactly satisfies an all-pole model. Each frame is then encoded by storing the parameters of the complex damped exponentials deduced from the all-pole model and its initial conditions. Decoding is performed by adding the complex damped exponentials based on the transmitted parameters. The new algorithm is demonstrated on a speech signal. The quality is compared with that of a standard coding algorithm at comparable compression ratios, by using the segmental signal-to-noise ratio (SNR).
A set of experiments in which the LVQ2 (learning vector quantization) algorithm of T. Kohonen et al (Proc. 1988 IEE Int. Conf. on Neural Networks, p.I-61-68, 1988) is used to generate vector codebooks for a discrete-o...
详细信息
A set of experiments in which the LVQ2 (learning vector quantization) algorithm of T. Kohonen et al (Proc. 1988 IEE Int. Conf. on Neural Networks, p.I-61-68, 1988) is used to generate vector codebooks for a discrete-observation hidden Markov model (HMM) classifier is described. Input feature vectors consist of single-frame linear predictive coding (LPC)-based cepstra and/or differenced cepstra. Classification accuracies using conventional k-means, class-specific k-means, and LVQ2 codebooks are compared for a 16-way speaker-independent vowel classification task. In contrast to speaker-dependent phonetic classification results previously published, no significant performance advantages are observed with LVQ2. These conflicting results are discussed relative to differences in the recognition tasks and the feature sets used. It is also argued that the single-observation Bayesian decision boundaries approximated by LVQ2 are nonoptimal for HMM-based classification involving multiple observations.< >
Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to th...
详细信息
Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to the question "Who spoke when ?". One inherent deficiency of most current systems is their inability to handle co-channel or overlapped speech. During the past few years, several studies have attempted dealing with the problem of overlapped or co-channel speech detection and separation, however, most of the algorithms suggested perform under unique conditions, require high computational complexity and require both time and frequency domain analysis of the audio data. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 60.0% of the frames labeled as overlapped speech by the baseline (ground-truth) segmentation , while keeping a 5% false-alarm rate.
The authors describe a low-complexity coding technique that combines multipulse and stochastic excitation. The system, known as hybrid multipulse coding (HMC), provides good quality at 4.8 and 7.2 kb/s. HMC uses effic...
详细信息
The authors describe a low-complexity coding technique that combines multipulse and stochastic excitation. The system, known as hybrid multipulse coding (HMC), provides good quality at 4.8 and 7.2 kb/s. HMC uses efficient pulse excitation for voiced speech and stochastic excitation for unvoiced speech. For the best speech quality, HMC uses an optimization algorithm for simultaneous solution of pitch predictor and excitation parameters, producing a higher signal-to-noise ratio than CELP (code-excited linear prediction) in the 4.8-kb/s configuration. At 7.2 kb/sec, CELP and HMC give diagnostic acceptability measure (DAM) scores close enough so that their standard error limits overlap. In all configurations, HMC requires from 5 to 14 times fewer multiply/accumulate operations than CELP for similarly sized codebooks.< >
In an effort to provide a more efficient representation of the acoustical speech signal in the pre classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of R.R. C...
详细信息
In an effort to provide a more efficient representation of the acoustical speech signal in the pre classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of R.R. Coifman and M.L. Wickerhauser (1992). This combines the advantages of using a smooth, compactly supported wavelet basis with an adaptive time scale analysis, dependent on the problem at hand. We start by briefly reviewing areas within speech recognition where the wavelet transform has been applied with some success. Examples include pitch detection, formant tracking, phoneme classification. Finally, our wavelet based feature extraction system is described and its performance on a simple phonetic classification problem given.
暂无评论