检索结果-内蒙古大学图书馆

IEEE/SP Workshop on Statistical Signal processing (SSP)

作者： Shuai Huang Damianos Karakos Daguang Xu Center of Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA Human Language Technology Center of Excellence Center of Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA

In recent work, Lyu and Simoncelli [1] introduced radial Gaussianization (RG) as a very efficient procedure for transforming n-dimensional random vectors into Gaussian vectors with independent and identically distributed (i.i.d.) components. This entails transforming the norms of the data so that they become chi-distributed with n degrees of freedom. A necessary requirement is that the original data are generated by an isotropic distribution, that is, their probability density function (pdf) is constant over surfaces of n-dimensional spheres (or, more general, n-dimensional ellipsoids). The case of biases in the data, which is of great practical interest, is studied here; as we demonstrate with experiments, there are situations in which even very small amounts of bias can cause RG to fail. This becomes evident especially when the data form clusters in low-dimensional manifolds. To address this shortcoming, we propose a two-step approach which entails (i) first discovering clusters in the data and removing the bias from each, and (ii) performing RG on the bias-compensated data. In experiments with synthetic data, the proposed bias compensation procedure results in significantly better Gaus-sianization than the non-compensated RG method.

关键词： Principal component analysis Noise measurement Estimation Gaussian distribution Distributed databases Optimization Transforms

来源：评论

学校读者我要写书评

暂无评论

Word Sense Disambiguation Corpora Acquisition via Confirmation Code 5

Word Sense Disambiguation Corpora Acquisition via Confirmati...

引用

5th International Joint Conference on Natural language processing, IJCNLP 2011

作者： Che, Wanxiang Liu, Ting Research Center for Social Computing and Information Retrieval MOE-Microsoft Key Laboratory of Natural Language Processing and Speech School of Computer Science and Technology Harbin Institute of Technology China

ISBN: (纸本)9789744665645

Word Sense Disambiguation (WSD) is one of the fundamental natural language processing tasks. However, lack of training corpora is a bottleneck to construct a high accurate all-words WSD system. Annotating a large-scale corpus by experts costs enormous time and financial resources. human Computation is a novel idea for integrating human resources behind the Web, which has been wasted, to solve practical problems that are difficult for computers. Based on human computation, we design a confirmation code system, which can not only distinguish between human beings and computers (the function of normal confirmation code system), but also annotate WSD corpora. The preliminary experimental result shows that the proposed method can annotate large-scale and high-quality WSD corpora within a short time. To the best of our knowledge, this is the first attempt to use confirmation code in natural language processing for corpora acquisition. © 2011 AFNLP

关键词： Natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Evaluating sentence compression: Pitfalls and suggested remedies 49

Evaluating sentence compression: Pitfalls and suggested reme...

引用

2011 Workshop on Monolingual Text-To-Text Generation at the 49th Annual Meeting of the Association for Computational Linguistics: human language Technologies, ACL-HLT 2011

作者： Napoles, Courtney Van Durme, Benjamin Callison-Burch, Chris Department of Computer Science Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States

ISBN: (纸本)9781937284053

This work surveys existing evaluation methodologies for the task of sentence compression, identifies their shortcomings, and proposes alternatives. In particular, we examine the problems of evaluating paraphrastic compression and comparing the output of different models. We demonstrate that compression rate is a strong predictor of compression quality and that perceived improvement over other models is often a side effect of producing longer output. © 2011 Association for Computational Linguistics

关键词：

来源：评论

学校读者我要写书评

暂无评论

Hill climbing on speech lattices: A new rescoring framework

Hill climbing on speech lattices: A new rescoring framework

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Ariya Rastrow Markus Dreyer Abhinav Sethy Sanjeev Khudanpur Bhuvana Ramabhadran Mark Dredze Human Language Technology Center of Excellence and Center of Language and Speech Processing Johns Hopkins University USA IBM Thomas J. Watson Research Center Yorktown Heights NY USA

We describe a new approach for rescoring speech lattices - with long-span language models or wide-context acoustic models - that does not entail computationally intensive lattice expansion or limited rescoring of only an N-best list. We view the set of word-sequences in a lattice as a discrete space equipped with the edit-distance metric, and develop a hill climbing technique to start with, say, the 1-best hypothesis under the lattice-generating model(s) and iteratively search a local neighborhood for the highest-scoring hypothesis under the rescoring model(s); such neighborhoods are efficiently constructed via finite state techniques. We demonstrate empirically that to achieve the same reduction in error rate using a better estimated, higher order language model, our technique evaluates fewer utterance-length hypotheses than conventional N-best rescoring by two orders of magnitude. For the same number of hypotheses evaluated, our technique results in a significantly lower error rate.

关键词： Lattices Computational modeling speech Acoustics speech recognition Table lookup Viterbi algorithm

来源：评论

学校读者我要写书评

暂无评论

Nonparametric Bayesian word sense induction

Nonparametric Bayesian word sense induction

引用

6th Workshop on Graph-Based Methods for Natural language processing, TextGraphs 2011

作者： Yao, Xuchen Durme, Benjamin Van Department of Computer Science Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States

ISBN: (纸本)9781937284008

We propose the use of a nonparametric Bayesian model, the Hierarchical Dirichlet Process (HDP), for the task of Word Sense Induction. Results are shown through comparison against Latent Dirichlet Allocation (LDA), a parametric Bayesian model employed by Brody and Lapata (2009) for this task. We find that the two models achieve similar levels of induction quality, while the HDP confers the advantage of automatically inducing a variable number of senses per word, as compared to manually fixing the number of senses a priori, as in LDA. This flexibility allows for the model to adapt to terms with greater or lesser polysemy, when evidenced by corpus distributional statistics. When trained on out-of-domain data, experimental results confirm the model's ability to make use of a restricted set of topically coherent induced senses, when then applied in a restricted domain. © 2011 The Association for Computational Linguistics.

关键词： Statistics

来源：评论

学校读者我要写书评

暂无评论

Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion 49

Paraphrastic Sentence Compression with a Character-based Met...

引用

2011 Workshop on Monolingual Text-To-Text Generation at the 49th Annual Meeting of the Association for Computational Linguistics: human language Technologies, ACL-HLT 2011

作者： Napoles, Courtney Callison-Burch, Chris Ganitkevitch, Juri Van Durme, Benjamin Department of Computer Science Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States

ISBN: (纸本)9781937284053

We present a substitution-only approach to sentence compression which "tightens" a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60% of the original length. In support of this task, we introduce a novel technique for re-ranking paraphrases extracted from bilingual corpora. At high compression rates1 paraphrastic compressions outperform a state-of-the-art deletion model in an oracle experiment. For further compression, deleting from oracle paraphrastic compressions preserves more meaning than deletion alone. In either setting, paraphrastic compression shows promise for surpassing deletion-only methods. © 2011 Association for Computational Linguistics

关键词：

来源：评论

学校读者我要写书评

暂无评论

Syntactic decision tree LMs: Random selection or intelligent design?

Syntactic decision tree LMs: Random selection or intelligent...

引用

Conference on Empirical Methods in Natural language processing, EMNLP 2011

作者： Filimonov, Denis Harper, Mary Human Language Technology Center of Excellence Johns Hopkins University United States Department of Computer Science University of Maryland College Park United States

ISBN: (纸本)1937284115

Decision trees have been applied to a variety of NLP tasks, including language modeling, for their ability to handle a variety of attributes and sparse context space. Moreover, forests (collections of decision trees) have been shown to substantially outperform individual decision trees. In this work, we investigate methods for combining trees in a forest, as well as methods for diversifying trees for the task of syntactic language modeling. We show that our tree interpolation technique outperforms the standard method used in the literature, and that, on this particular task, restricting tree contexts in a principled way produces smaller and better forests, with the best achieving an 8% relative reduction in Word Error Rate over an n-gram baseline. © 2011 Association for Computational Linguistics.

关键词： Decision trees

来源：评论

学校读者我要写书评

暂无评论

Generalized interpolation in decision tree LM

Generalized interpolation in decision tree LM

引用

49th Annual Meeting of the Association for Computational Linguistics: human language Technologies, ACL-HLT 2011

ISBN: (纸本)9781932432886

In the face of sparsity, statistical models are often interpolated with lower order (backoff) models, particularly in language Modeling. In this paper, we argue that there is a relation between the higher order and the backoff model that must be satisfied in order for the interpolation to be effective. We show that in n-gram models, the relation is trivially held, but in models that allow arbitrary clustering of context (such as decision tree models), this relation is generally not satisfied. Based on this insight, we also propose a generalization of linear interpolation which significantly improves the performance of a decision tree language model. © 2011 Association for Computational Linguistics.

关键词： Decision trees

来源：评论

学校读者我要写书评

暂无评论

Confidence-weighted linear classification for text categorization

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2012年第1期13卷

作者： Koby Crammer Mark Dredze Fernando Pereira Department of Electrical Engineering The Technion Haifa Israel Human Language Technology Center of Excellence Johns Hopkins University Baltimore MD Google Inc. Mountain View CA

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.

关键词： confidence prediction online learning text categorization

来源：评论

学校读者我要写书评

暂无评论

Learning sub-word units for open vocabulary speech recognition

Learning sub-word units for open vocabulary speech recogniti...

引用

49th Annual Meeting of the Association for Computational Linguistics: human language Technologies, ACL-HLT 2011

作者： Parada, Carolina Dredze, Mark Sethy, Abhinav Rastrow, Ariya Human Language Technology Center of Excellence Johns Hopkins University 3400 N Charles Street Baltimore MD United States IBM T.J. Watson Research Center Yorktown Heights NY United States

ISBN: (纸本)9781932432879

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems;new words can then be represented by combinations of subword units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the subword lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively. © 2011 Association for Computational Linguistics.

关键词： Vocabulary control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：