We present a simplified derivation of the extended Baum-Welch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs)...
详细信息
ISBN:
(纸本)8790834100
We present a simplified derivation of the extended Baum-Welch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended Baum-Welch procedure for discriminative estimation of MLLR-Type speaker adaptation transformations. The resulting adaptation procedure, termed Conditional Maximum Likelihood Linear Regression (CMLLR), is used successfully for supervised and unsupervised adaptation tasks on the Switchboard corpus, yielding an improvement over MLLR. The interaction of unsupervised CMLLR with segmental minimum Bayes risk lattice voting procedures is also explored, showing that the two procedures are complimentary.
Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to trai...
详细信息
Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to train, because of the high cost of annotating training data. We propose an iterative training algorithm that seeks to improve the error rate of a speech recognizer without incurring additional transcription cost, by selecting a subset of the already available transcribed training data. We apply the proposed algorithm to an alpha-digit recognition problem and reduce the error rate from 10.3% to 9.4% on a particular test set.
The paper describes an architecture for multi-channel and multi-modal applications. First the design problem is explored and a proposal for a system that can handle multi-modal interaction and delivery of Internet con...
详细信息
The paper describes an architecture for multi-channel and multi-modal applications. First the design problem is explored and a proposal for a system that can handle multi-modal interaction and delivery of Internet content is proposed. The focus is pertained in some development aspects and the way they are addressed by using state-of-the-art tools. The various components are defined and described in detail. Finally, conclusions and a view of future work on the evolution of such systems is given.
A hybrid support vector machine (SVM) and hidden Markov model (HMM) approach is proposed for designing continuous speech recognition systems. Using novel properties of SVMs and combining them with HMMs one can obtain ...
详细信息
A hybrid support vector machine (SVM) and hidden Markov model (HMM) approach is proposed for designing continuous speech recognition systems. Using novel properties of SVMs and combining them with HMMs one can obtain models that map easily to hardware and lead to more modular and scalable design. The overall architecture of the proposed system is based on the MAP (maximum a posteriori) framework which offers a direct, feedforward recognition model. The SVMs generate smooth estimates of local transition probabilities in the HMM, conditioned on the acoustic inputs. The transition probabilities are then used to estimate the global posterior probabilities of HMM state sequences. A parallel architecture that implements a simple speech recognition model in real-time is presented.
Identifying and classifying personal, geographic, institutional or other names in a text is an important task for numerous applications. This paper describes and evaluates a language-independent bootstrapping algorith...
详细信息
This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive topic-probability ...
详细信息
This paper describes and extensively evaluates a system for the automatic routing of submitted papers to reviewers and area committees, without the need for any human annotation from the reviewers or the program chair...
详细信息
Resnik and Yarowsky (1997) made a set of observations about the state-of-the-art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evalu...
The problem of blindly separating signal mixtures with fewer mixture components than independent signal sources is mathematically ill-defined, and requires suitable prior information on the nature of the sources. Rece...
详细信息
The problem of blindly separating signal mixtures with fewer mixture components than independent signal sources is mathematically ill-defined, and requires suitable prior information on the nature of the sources. Recently, it has been shown that sparse methods for function approximation using a Laplacian prior can be effective, but the method fails to separate a single mixture without further prior information. Other techniques track harmonics, but assume separability in the time-frequency domain. We show that a measure of temporal and spectral coherence provides an effective cue for separating independent acoustical or sonar sources, in the absence of spatial cues in the monaural case. The technique is shown to successfully separate single mixtures of sources with significant spectral overlap.
暂无评论