检索结果-内蒙古大学图书馆

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Shantanu Chakrabartty Gert Cauwenberghs Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA

A forward decoding approach to kernel machine learning is presented. The method combines concepts from Markovian dynamics, large margin classifiers and reproducing kernels for robust sequence detection by learning inter-data dependencies. A MAP (maximum a posteriori) sequence estimator is obtained by regressing transition probabilities between symbols as a function of received data. The training procedure involves maximizing a lower bound of a regularized cross-entropy on the posterior probabilities, which simplifies into direct estimation of transition probabilities using kernel logistic regression. Applied to channel equalization, forward decoding kernel machines outperform support vector machines and other techniques by about 5dB in SNR for given BER, within 1 dB of theoretical limits.

关键词： Support vector machines Training Decoding Kernel Equalizers

来源：评论

学校读者我要写书评

暂无评论

Forward-Decoding Kernel-Based Phone Sequence Recognition 02

Forward-Decoding Kernel-Based Phone Sequence Recognition

引用

Annual Conference on Neural Information processing Systems

作者： Shantanu Chakrabartty Gert Cauwenberghs Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD 21218

ISBN: (纸本)0262025507

Forward decoding kernel machines (FDKM) combine large-margin classifiers with hidden Markov models (HMM) for maximum a posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model trained with a recursive scheme that deals effectively with noisy and partially labeled data. Training over very large datasets is accomplished using a sparse probabilistic support vector machine (SVM) model based on quadratic entropy, and an on-line stochastic steepest descent algorithm. For speaker-independent continuous phone recognition, FDKM trained over 177, 080 samples of the TIMIT database achieves 80.6% recognition accuracy over the full test set, without use of a prior phonetic language model.

关键词： Telephone modelling languages Support Vector Network observational data label data steepest descent algorithm speaker-independent

来源：评论

学校读者我要写书评

暂无评论

Tone recognition in Thai continuous speech based on coarticulaion, intonation and stress effects 7

Tone recognition in Thai continuous speech based on coarticu...

引用

7th International Conference on Spoken language processing, ICSLP 2002

作者： Thubthong, Nuttakorn Kijsirikul, Boonserm Luksaneeyanawin, Sudaporn Department of Physics Faculty of Science Chulalongkorn Uiveristy Phayathai Rd. Bangkok10330 Thailand Department of Computer Engineering Faculty of Engineering Chulalongkorn Uiveristy Phayathai Rd. Bangkok10330 Thailand Centre for Research in Speech and Language Processing Faculty of Arts Chulalongkorn Uiveristy Phayathai Rd. Bangkok10330 Thailand

Tone recognition is a critical component for speech recognition in a tone language. One of the main problems of tone recognition in continuous speech is that several interacting factors affect F0 realization of tones. In this paper, we focus on the coarticulatory, intonation, and stress effects. These effects are compensated by the tone information of neighboring syllables, the adjustment of F0 heights and the stress acoustic features, respectively. The experiments, which compare all tone features, were conducted by feedforward neural networks. The highest recognition rates are improved from 84.07% to 93.60% and 82.48% to 92.67% for Thai proper name and Thai animal story corpora, respectively.

关键词： Feedforward neural networks

来源：评论

学校读者我要写书评

暂无评论

The TREC2001 video track: Information retrieval on digital video information 6

引用

6th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2002

作者： Smeaton, Alan F. Over, Paul Costello, Cash J. de Vries, Arjen P. Doermann, David Hauptmann, Alexander Rorvig, Mark E. Smith, John R. Wu, Lide Centre for Digital Video Processing Dublin City University Dublin 9 Ireland National Institute for Standards and Technology GaithersburgMD United States Johns Hopkins University Applied Physics Laboratory LaurelMD United States CWI Amsterdam Netherlands Laboratory for Language and Media Processing University of Maryland College ParkMD United States School of Computer Science Carnegie Mellon University United States School of Library Information Sciences University of North Texas TX United States IBM T. J. Watson Research Center HawthorneNY United States Dept. of Computer Science Fudan University Shanghai China

ISBN: (纸本)3540441786

The development of techniques to support content-based access to archives of digital video information has recently started to receive much attention from the research community. During 2001, the annual TREC activity, which has been benchmarking the performance of information retrieval techniques on a range of media for 10 years, included a „track"or activity which allowed investigation into approaches to support searching through a video library. This paper is not intended to provide a comprehensive picture of the different approaches taken by the TREC2001 video track participants but instead we give an overview of the TREC video search task and a thumbnail sketch of the approaches taken by different groups. The reason for writing this paper is to highlight the message from the TREC video track that there are now a variety of approaches available for searching and browsing through digital video archives, that these approaches do work, are scalable to larger archives and can yield useful retrieval performance for users. This has important implications in making digital libraries of video information attainable. © Springer-Verlag Berlin Heidelberg 2002.

关键词： Information retrieval

来源：评论

学校读者我要写书评

暂无评论

Learning-based detection, segmentation and matching of objects 2nd

引用

2nd International Conference on Advances in Pattern Recognition, ICAPR 2001

作者： Duta, Nicolae Jain, Anil K. Speech and Language Processing Department BBN Technologies Cambridge United States Department of Computer Science and Engineering Michigan State University United States

来源：评论

学校读者我要写书评

暂无评论

Discriminative speaker adaptation with conditional maximum likelihood linear regression 7

Discriminative speaker adaptation with conditional maximum l...

引用

7th European Conference on speech Communication and Technology - Scandinavia, EUROspeech 2001

作者： Gunawardana, Asela Byrne, William Department of Electrical and Computer Engineering Center for Language and Speech Processing Johns Hopkins University 3400 N. Charles St. BaltimoreMD21218 United States

ISBN: (纸本)8790834100

We present a simplified derivation of the extended Baum-Welch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended Baum-Welch procedure for discriminative estimation of MLLR-Type speaker adaptation transformations. The resulting adaptation procedure, termed Conditional Maximum Likelihood Linear Regression (CMLLR), is used successfully for supervised and unsupervised adaptation tasks on the Switchboard corpus, yielding an improvement over MLLR. The interaction of unsupervised CMLLR with segmental minimum Bayes risk lattice voting procedures is also explored, showing that the two procedures are complimentary.

关键词： Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

Multimodal Error Correction for speech User Interfaces

引用

ACM Transactions on computer-Human Interaction 2001年第1期8卷 60-98页

作者： Suhm, Bernhard Myers, Brad Waibel, Alex Speech and Language Processing BBN Technologies 70 Fawcett Street Cambridge MA 02138 United States Human Computer Interaction Institute School of Computer Science Carnegie Mellon University Pittsburgh PA 15213-3891 United States Interactive Systems Laboratories School of Computer Science Carnegie Mellon University and Karlsruhe University (Germany) Pittsburgh PA 15221 United States

Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work hypothesized that switching modality could speed up interactive correction of recognition errors. This article presents multimodal error correction methods that allow the user to correct recognition errors efficiently without keyboard input. Correction accuracy is maximized by novel recognition algorithms that use context information for recognizing correction input. Multimodal error correction is evaluated in the context of a prototype multimodal dictation system. The study shows that unimodal repair is less accurate than multimodal error correction. On a dictation task, multimodal correction is faster than unimodal correction by respeaking. The study also provides empirical evidence that system-initiated error correction (based on confidence measures) may not expedite error correction. Furthermore, the study suggests that recognition accuracy determines user choice between modalities: while users initially prefer speech, they learn to avoid ineffective correction modalities with experience. To extrapolate results from this user study, the article introduces a performance model of (recognition-based) multimodal interaction that predicts input speed including time needed for error correction. Applied to interactive error correction, the model predicts the impact of improvements in recognition technology on correction speeds, and the influence of recognition accuracy and correction method on the productivity of dictation systems. This model is a first step toward formalizingmultimodal interaction. © 2001, ACM. All rights reserved.

关键词： Design dictation systems Experimentation Human Factors interactive error correction Measurement Multimodal interfaces pen input performance model speech input speech user interfaces

来源：评论

学校读者我要写书评

暂无评论

Automatic selection of transcribed training material

Automatic selection of transcribed training material

引用

IEEE Workshop on Automatic speech Recognition and Understanding

作者： T.M. Kamm G.G.L. Meyer Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA

Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to train, because of the high cost of annotating training data. We propose an iterative training algorithm that seeks to improve the error rate of a speech recognizer without incurring additional transcription cost, by selecting a subset of the already available transcribed training data. We apply the proposed algorithm to an alpha-digit recognition problem and reduce the error rate from 10.3% to 9.4% on a particular test set.

关键词： speech recognition Error analysis Iterative algorithms Training data Costs System testing Natural languages speech processing Data mining Automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

Aspects of design and implementation of a multi-channel and multi-modal information system

Aspects of design and implementation of a multi-channel and ...

引用

International Conference on Software Maintenance (ICSM)

作者： V. Demesticha J. Gergic J. Kleindienst M. Mast L. Polymenakos H. Schulz L. Seredi IBM Hellas Athens Greece IBM Praha IBM Czech Republic Limited Prague Czech Republic European Speech Research IBM Germany Laboratory Heidelberg Germany Computer Science Department IBM T.J. Watson Research Center Human Language Technologies Yorktown Heights NY USA

The paper describes an architecture for multi-channel and multi-modal applications. First the design problem is explored and a proposal for a system that can handle multi-modal interaction and delivery of Internet content is proposed. The focus is pertained in some development aspects and the way they are addressed by using state-of-the-art tools. The various components are defined and described in detail. Finally, conclusions and a view of future work on the evolution of such systems is given.

关键词： Information systems

来源：评论

学校读者我要写书评

暂无评论

A comparison of data-derived and knowledge-based modeling of pronunciation variation 6

A comparison of data-derived and knowledge-based modeling of...

引用

6th International Conference on Spoken language processing, ICSLP 2000

作者： Wester, Mirjam Fosler-Lussier, Eric International Computer Science Institute 1947 Center Street BerkeleyCA94704 United States A2RT Dept. of Language and Speech University of Nijmegen Netherlands

ISBN: (纸本)7801501144

This paper focuses on modeling pronunciation variation in two different ways: data-derived and knowledge-based. The knowledge-based approach consists of using phonological rules to generate variants. The data-derived approach consists of performing phone recognition, followed by various pruning and smoothing methods to alleviate some of the errors in the phone recognition. Using phonological rules led to a small improvement in WER;whereas, using a data-derived approach in which the phone recognition was smoothed using simple decision trees (d-trees) prior to lexicon generation led to a significant improvement compared to the baseline. Furthermore, we found that 10% of variants generated by the phonological rules were also found using phone recognition, and this increased to 23% when the phone recognition output was smoothed by using d-trees. In addition, we propose a metric to measure confusability in the lexicon and we found that employing this confusion metric to prune variants results in roughly the same improvement as using the d-tree method.

关键词： Decision trees

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：