检索结果-内蒙古大学图书馆

18th Annual Conference on Neural Information processing Systems, NIPS 2004

作者： Xu, Peng Jelinek, Frederick Department of Electrical and Computer Engineering Center for Language and Speech Processing Johns Hopkins University United States

ISBN: (纸本)0262195348

In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.

关键词： Decision trees

来源：评论

学校读者我要写书评

暂无评论

Edge detection: Wavelets versus conventional methods on DSP processors

引用

Machine Graphics and Vision 2005年第1期14卷 83-101页

作者： Abdel-Qader, Ikhlas M. Maddix, Marie E. Electrical and Computer Engineering Department College of Engineering and Applied Sciences Western Michigan University Kalamazoo MI 49008 ECE Department WMU Center College of Engineering and Applied Sciences Western Michigan University IEEE Accoustics Speech and Signal Processing Society IEEE Engineering in Medicine and Biology Society Society of Women Engineers Honor Society of Phi Kappa Phi Sigma Xi Scientific Research Society Tau Beta Pi Engineering Honors Fraternity Eaton Corporation Current Product Engineering Group

Edge detection is a cornerstone in any computer, robotic or machine vision system. Real time edge detection is a pre-process to many critical applications, such as assembly line inspection and surveillance. Wavelets-based algorithms are replacing traditional algorithms, especially the Haar wavelet because of its simplicity. The Haar algorithm uses a multilevel decomposition to produce image edges corresponding to high frequency wavelet coefficients. In this paper, a real time edge detection algorithm based on Haar is analyzed and compared to conventional edge detectors. Other implemented and compared algorithms are the traditional Prewitt algorithm, and, from a newer generation, the Canny algorithm. The real time implementation of all algorithms is accomplished using TI TMS320C6711 card. In case of Haar, the multilevel decomposition improves the results obtained with noisy images. The results show that the Haar-based edge detector has a low execution time with accurate edge results, and thus represents a suitable algorithm for on-line vision system applications. Canny has produced the thinnest edges, but is not suitable for real time processing using the 6711, and falls short in edge results compared to the Haar results. The wavelet-based algorithm has outperformed other edge detectors.

关键词： Edge detection

来源：评论

学校读者我要写书评

暂无评论

Fast sequential closed‐phase glottal inverse filtering based on optimally weighted recursive least squares

引用

The Journal of the Acoustical Society of America 2005年第S1期78卷 S7-S8页

作者： T. C. Luk J. R. Deller, Jr. Baldwin Technology Corporation 5118 S. Dansher Road Countryside IL 60525 Northeastern University Department of Electrical & Computer Engineering Boston MA 02115 The Center for Speech Processing & Perception Boston MA 02115

Presented in this paper is the theoretical basis, with simulation verification, for a sequential method of deconvolution of the glottal waveform from voiced speech. The technique is based upon a linear predictive model of the vocal tract, and the assumption of a “pseudo‐closed phase” (PCP) (noisy closed phase) of the glottis during each pitch period. Existing techniques for closed phase glottal inverse filtering (CPIF) employ “batch”‐type methods which are generally slow, highly user interactive, and restricted to the use of one cycle of data in the analysis. The basic ideas underlying CPIF, and a brief review of existing methods, will be presented in the first part of the paper. In the second part of this paper, requisite theoretical results and the new method will be developed. In particular, these will include a unified theory of CPIF in which the selection of closed phase points is viewed as a data weighting process. This viewpoint readily admits the use of more than one cycle of data in the analysis (advantageous when the data fare noisy), and further leads to the use of optimal weighting of the accepted data. The theory of “membership set” identification is used as a basis for optimization of weights. Novel weighting strategies employed in a conventional recursive least‐squares algorithm form the basis of the improved technique. The last part of the paper contains simulation studies and computational considerations. The new method is shown to result in significant increases in both accuracy and computational efficiency.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Robustness aspects of active learning for acoustic modeling 8

Robustness aspects of active learning for acoustic modeling

引用

8th International Conference on Spoken language processing, ICSLP 2004

作者： Kamm, Teresa M. Meyer, Gerard G.L. Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University BaltimoreMD United States

We previously proposed [1] an iterative word-selective training method to cost-effectively utilize data preparation resources without compromising system performance. We continue this work and investigate the robustness of our active learning approach with respect to the starting conditions and further propose a stopping criterion that supports our objective to make effective use of transcription effort while minimizing system error. In particular, we demonstrate robustness to seven initial conditions, showing that we can select around 20 hours of training data and achieve a range of error rates between 8.6% and 9.0%, compared to an error rate of 10% when using all 50 hours of the training set. Additionally, we give empirical evidence that our proposed stopping criterion is in general a good predictor of when the minimum error rate is achieved, demonstrated for each of the initial conditions.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Using random forests in the structured language model 04

Using random forests in the structured language model

引用

Proceedings of the 18th International Conference on Neural Information processing Systems

作者： Peng Xu Frederick Jelinek Center for Language and Speech Processing Department of Electrical and Computer Engineering The Johns Hopkins University

In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech ***, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language *** have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.

关键词：

来源：评论

学校读者我要写书评

暂无评论

speechFIND: spoken document retrieval for a national gallery of the spoken word

SPEECHFIND: spoken document retrieval for a national gallery...

引用

Proceedings of the Nordic Signal processing Symposium (NORSIG)

作者： J.H.L. Hansen Rongqing Huang P. Mangalath Bowen Zhou M. Seadle J.R. Deller Robust Speech Processing Group Center for Spoken Language Research University of Colorado Boulder CO USA Michigan State University East Lansing MI USA Department Electrical & Computer Engineering Michigan State University East Lansing MI USA

来源：评论

学校读者我要写书评

暂无评论

Automatic recognition of spontaneous speech for access to multilingual oral history archives

Automatic recognition of spontaneous speech for access to mu...

引用

作者： Byrne, William Doermann, David Franz, Martin Gustman, Samuel Hajič, Jan Oard, Douglas Picheny, Michael Psutka, Josef Ramabhadran, Bhuvana Soergel, Dagobert Ward, Todd Zhu, Wei-Jing Ctr. for Lang. and Speech Processing Department of Electrical Engineering Johns Hopkins University Baltimore MD 21218 United States Inst. for Advanced Computer Studies University of Maryland College Park MD 20742 United States Natural Language Systems Department IBM T. J. Watson Research Center Yorktown Heights NY 10598 United States Survivors Shoah Vis. Hist. Found. Los Angeles CA 90078 United States Inst. of Formal/Applied Linguistics Center for Computational Linguistics Charles University CZ-11800 Prague 1 Czech Republic Inst. for Advanced Computer Studies College of Information Studies University of Maryland College Park MD 20742 United States Hum. Lang. Technologies Department IBM T. J. Watson Research Center Yorktown Heights NY 10598 United States Department of Cybernetics Center for Computational Linguistics University of West Bohemia CZ-30614 Pilsen Czech Republic College of Information Studies University of Maryland College Park MD 20742 United States

Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F = 0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.

关键词： speech recognition

来源：评论

学校读者我要写书评

暂无评论

Forward-decoding kernel-based phone sequence recognition 15

Forward-decoding kernel-based phone sequence recognition

引用

16th Annual Neural Information processing Systems Conference, NIPS 2002

作者： Chakrabartty, Shantanu Cauwenberghs, Gert Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD 21218 United States

ISBN: (纸本)0262025507

Forward decoding kernel machines (FDKM) combine large-margin classifiers with hidden Markov models (HMM) for maximum a posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model trained with a recursive scheme that deals effectively with noisy and partially labeled data. Training over very large datasets is accomplished using a sparse probabilistic support vector machine (SVM) model based on quadratic entropy, and an on-line stochastic steepest descent algorithm. For speaker-independent continuous phone recognition, FDKM trained over 177,080 samples of the TIMET database achieves 80.6% recognition accuracy over the full test set, without use of a prior phonetic language model.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

Word-selective training for speech recognition

Word-selective training for speech recognition

引用

IEEE Workshop on Automatic speech Recognition and Understanding

作者： T.M. Kamm G.G.L. Meyer Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA

We previously proposed (Kamm and Meyer (2001, 2002)) a two-pronged approach to improve system performance by selective use of training data. We demonstrated a sentence-selective algorithm that, first, made effective use of the available humanly transcribed training data and, second, focused future human transcription effort on data that was more likely to improve system performance. We now extend that algorithm to focus on word selection, and demonstrate that we can reduce the error rate from 10.3 % to 9.3 % on a simple, 36-word corpus, by selecting 30 % (15 hours) of the 50 hours of training data available in this corpus, without knowledge of the true transcription. We also discuss application of our word selection algorithm to the Wall Street Journal 5 K word task. Preliminary results show that we can select up to 60 % (48 hours) of the training data, with minimal knowledge of the true transcription, and match or beat the error rate of a system built using the same amount of randomly selected training data.

关键词： speech recognition Training data Error analysis System performance Humans Costs Natural languages speech processing Automatic speech recognition Learning systems

来源：评论

学校读者我要写书评

暂无评论

Sequence estimation and channel equalization using forward decoding kernel machines

Sequence estimation and channel equalization using forward d...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Shantanu Chakrabartty Gert Cauwenberghs Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA

A forward decoding approach to kernel machine learning is presented. The method combines concepts from Markovian dynamics, large margin classifiers and reproducing kernels for robust sequence detection by learning inter-data dependencies. A MAP (maximum a posteriori) sequence estimator is obtained by regressing transition probabilities between symbols as a function of received data. The training procedure involves maximizing a lower bound of a regularized cross-entropy on the posterior probabilities, which simplifies into direct estimation of transition probabilities using kernel logistic regression. Applied to channel equalization, forward decoding kernel machines outperform support vector machines and other techniques by about 5dB in SNR for given BER, within 1 dB of theoretical limits.

关键词： Support vector machines Training Decoding Kernel Equalizers

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：