A set of experiments in which the LVQ2 (learning vector quantization) algorithm of T. Kohonen et al (Proc. 1988 IEE Int. Conf. on Neural Networks, p.I-61-68, 1988) is used to generate vector codebooks for a discrete-o...
详细信息
A set of experiments in which the LVQ2 (learning vector quantization) algorithm of T. Kohonen et al (Proc. 1988 IEE Int. Conf. on Neural Networks, p.I-61-68, 1988) is used to generate vector codebooks for a discrete-observation hidden Markov model (HMM) classifier is described. Input feature vectors consist of single-frame linear predictive coding (LPC)-based cepstra and/or differenced cepstra. Classification accuracies using conventional k-means, class-specific k-means, and LVQ2 codebooks are compared for a 16-way speaker-independent vowel classification task. In contrast to speaker-dependent phonetic classification results previously published, no significant performance advantages are observed with LVQ2. These conflicting results are discussed relative to differences in the recognition tasks and the feature sets used. It is also argued that the single-observation Bayesian decision boundaries approximated by LVQ2 are nonoptimal for HMM-based classification involving multiple observations.< >
We present in this paper a new binomial sine pulse (BSP) excitation signal used in linear prediction-based speech codecs. The structure of the BSP excitation signal is actually a sine wave whose amplitude is modulated...
详细信息
We present in this paper a new binomial sine pulse (BSP) excitation signal used in linear prediction-based speech codecs. The structure of the BSP excitation signal is actually a sine wave whose amplitude is modulated by a binomial signal. The binomial signal describes the various trends of excitation signals in a pitch period, and the pulsatance of the BSP excitation signal coincides with the vibration frequency of vocal folds. In experiments, processing is going on frame by frame and the same excitation signal is placed at every pitch excitation moment in a frame. Speech codecs based on this new BSP excitation have the advantages of low complexity and low delay. Experiment results prove that such a new speech codec can provide highly intelligible synthesized speech below 3 kbps.
In an effort to provide a more efficient representation of the acoustical speech signal in the pre classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of R.R. C...
详细信息
In an effort to provide a more efficient representation of the acoustical speech signal in the pre classification stage of a speech recognition system, we consider the application of the Best-Basis Algorithm of R.R. Coifman and M.L. Wickerhauser (1992). This combines the advantages of using a smooth, compactly supported wavelet basis with an adaptive time scale analysis, dependent on the problem at hand. We start by briefly reviewing areas within speech recognition where the wavelet transform has been applied with some success. Examples include pitch detection, formant tracking, phoneme classification. Finally, our wavelet based feature extraction system is described and its performance on a simple phonetic classification problem given.
A classifier for utterance rejection in a hidden Markov model (HMM) based speech recognizer is presented. This classifier, termed the two-pass classifier, is a postprocessor to the HMM recognizer, and consists of a tw...
详细信息
A classifier for utterance rejection in a hidden Markov model (HMM) based speech recognizer is presented. This classifier, termed the two-pass classifier, is a postprocessor to the HMM recognizer, and consists of a two-stage discriminant analysis. The first stage employs the generalized probabilistic descent (GPD) discriminative training framework, while the second stage performs linear discrimination combining the output of the first stage with HMM likelihood scores. In this fashion the classification power of the HMM is combined with that of the GPD stage which is specifically designed for keyword/nonkeyword classification. Experimental results show that, on two separate databases, the two-pass classifier significantly outperforms a single-pass classifier based solely on the HMM likelihood scores.< >
Voice over IP (VoIP) can be used in a wide variety of applications, all having different requirements. We present JVOIPLIB and JRTPLIB, a VoIP library and an RTP library respectively. Together they make it possible to...
详细信息
ISBN:
(纸本)0769513212
Voice over IP (VoIP) can be used in a wide variety of applications, all having different requirements. We present JVOIPLIB and JRTPLIB, a VoIP library and an RTP library respectively. Together they make it possible to easily add VoIP to various types of applications. Both libraries are written in an object-oriented style in C++, are open-source and are both very extensible. Several measures have been taken to allow good synchronization between the communicating parties.
In high dimensional feature space with finite samples, severe bias can be introduced in the nearest neighbor algorithm. In this paper, we propose a new classification method, which performs classification task based o...
详细信息
ISBN:
(纸本)0769525210
In high dimensional feature space with finite samples, severe bias can be introduced in the nearest neighbor algorithm. In this paper, we propose a new classification method, which performs classification task based on local probability center of each class. Moreover, this prototype-based method classifies the query sample by using two measures, one is the distance between query and local probability centers, the other is the posterior probability of query. Although both measures are effect, the experiments show the second one is the better. The investigation results prove that this method improves the classification performance of nearest neighbor algorithm substantially
This paper proposes a novel robust fundamental frequency (F0) estimation algorithm based on complex-valued speech analysis for an analytic speech signal. Since analytic signal provides spectra only over positive frequ...
详细信息
ISBN:
(纸本)1424405343
This paper proposes a novel robust fundamental frequency (F0) estimation algorithm based on complex-valued speech analysis for an analytic speech signal. Since analytic signal provides spectra only over positive frequencies, spectra can be accurately estimated in low frequencies. Consequently, it is considered that F0 estimation using the residual signal extracted by complex-valued speech analysis can perform better for F0 estimation than that for the residual signal extracted by conventional real-valued LPC analysis. In this paper, the autocorrelation function weighted by AMDF is adopted for the F0 estimation criterion and four signals; speech signal, analytic speech signal, LPC residual and complex LPC residual, are evaluated for the F0 estimation. Speech signals used in the experiments were corrupted by adding white Gaussian noise whose noise levels are 10, 5, 0, -5 [dB]. The experimental results demonstrate that the proposed algorithm based on complex speech analysis can perform better than other methods in an extremely noisy environment
In this paper, the main target is to compare the system lifetimes in diverse scenarios based on our new energy dissipation model in the body sensor network as well as maximize the system lifetime of sensor nodes when ...
详细信息
In this paper, the main target is to compare the system lifetimes in diverse scenarios based on our new energy dissipation model in the body sensor network as well as maximize the system lifetime of sensor nodes when they will make communication among body sensors and personal communication unit. Nowadays, the ultra low energy consumption is the very much important challenge for medical applications. The best compression technique like LPC is selected for energy saving based on some calculations. This research work studies and analyzes the transceiver energy consumption for different compression algorithms and select the best technique like LPC for energy saving. Formulation of a linear programming problem is also the important part of this research work, where is to maximize the system lifetime which is equivalent to the time until the first node runs out of battery. Maximum system lifetimes are calculated by MATLAB optimization technique using and without using efficient compression algorithm like LPC in various environments. Results show that maximum system lifetimes calculated in different scenarios using efficient compression technique like LPC is better than without using compression technique.
An analysis and simulation results are presented comparing the performance of several types of high-order backward adaptive predictors with orders up to 100. Issues in high-order linear predictive coding (LPC) analysi...
详细信息
An analysis and simulation results are presented comparing the performance of several types of high-order backward adaptive predictors with orders up to 100. Issues in high-order linear predictive coding (LPC) analysis, such as analysis methods, windowing, ill-conditioning, quantization noise effects, and computational complexity, are studied. The performance of the various analysis methods is compared with the conventional sequential formant-pitch predictor. The auto-correlation method (50th order) shows performance advantages over the sequential formant-pitch configurations. Several new backward high-order methods using covariance analysis and a lattice formulation show much better prediction gains than the auto-correlation method.< >
暂无评论