检索结果-内蒙古大学图书馆

10th Conference on Computational Natural language Learning, CoNLL-X

作者： Dreyer, Markus Smith, David A. Smith, Noah A. Department of Computer Science/Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 United States

We describe our entry in the CoNLL-X shared task. The system consists of three phases: a probabilistic vine parser (Eisner and N. Smith, 2005) that produces unlabeled dependency trees, a probabilistic relation-labeling model, and a discriminative minimum risk reranker (D. Smith and Eisner, 2006). The system is designed for fast training and decoding and for high precision. We describe sources of crosslingual error and ways to ameliorate them. We then provide a detailed error analysis of parses produced for sentences in German (much training data) and Arabic (little training data).

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

Novel probabilistic finite-state transducers for cognate and transliteration modeling

Novel probabilistic finite-state transducers for cognate and...

引用

7th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2006

作者： Schafer, Charles Department of Computer Science Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 United States

We present and empirically compare a range of novel probabilistic finite-state transducer (PFST) models targeted at two major natural language string transduction tasks, transliteration selection and cognate translation selection. Evaluation is performed on 10 distinct language pair data sets, and in each case novel models consistently and substantially outperform a well-established standard reference algorithm. © 2006 The Association for Machine Translation in the Americas.

关键词： Transducers

来源：评论

学校读者我要写书评

暂无评论

A fast finite-state relaxation method for enforcing global constraints on sequence decoding 06

A fast finite-state relaxation method for enforcing global c...

引用

2006 Human language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006

作者： Tromble, Roy W. Eisner, Jason Department of Computer Science Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 United States

We describe finite-state constraint relaxation, a method for applying global constraints, expressed as automata, to sequence model decoding. We present algorithms for both hard constraints and binary soft constraints. On the CoNLL-2004 semantic role labeling task, we report a speedup of at least 16x over a previous method that used integer linear programming. © 2006 Association for Computational Linguistics.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

Minimum risk annealing for training log-linear models 21

Minimum risk annealing for training log-linear models

引用

21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL 2006

作者： Smith, David A. Eisner, Jason Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled dependency parsing. © 2006 Association for Computational Linguistics

关键词： Machine translation

来源：评论

学校读者我要写书评

暂无评论

Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies

Quasi-synchronous grammars: Alignment by soft projection of ...

引用

2006 Workshop on Statistical Machine Translation, WMT 2006, collocated with the HLT-NAACL 2006

作者： Smith, David A. Eisner, Jason Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

Many syntactic models in machine translation are channels that transform one tree into another, or synchronous grammars that generate trees in parallel. We present a new model of the translation process: quasi-synchronous grammar (QG). Given a source-language parse tree T1, a QG defines a monolingual grammar that generates translations of T1. The trees T2allowed by this monolingual grammar are inspired by pieces of substructure in T1and aligned to T1at those points. We describe experiments learning quasi-synchronous context-free grammars from bitext. As with other monolingual language models, we evaluate the crossentropy of QGs on unseen text and show that a better fit to bilingual data is achieved by allowing greater syntactic divergence. When evaluated on a word alignment task, QG matches standard baselines. © HLT-NAACL *** right reserved.

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

A weighted finite state transducer translation template model for statistical machine translation

引用

Natural language Engineering 2006年第1期12卷 35-75页

作者： Kumar, Shankar Deng, Yonggang Byrne, William Center for Language and Speech Processing Department of Electrical and Computer Engineering The Johns Hopkins University 3400 N. Charles St. Baltimore MD 21218 United States

We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. This is a source-channel model of translation inspired by the Alignment Template translation model. The model attempts to overcome the deficiencies of word-to-word translation models by considering phrases rather than words as units of translation. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers. One of the benefits of using this framework is that it avoids the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We report and analyze bitext word alignment and translation performance on the Hansards French-English task and the FBIS Chinese-English task under the Alignment Error Rate, BLEU, NIST and Word Error-Rate metrics. These experiments identify the contribution of each of the model components to different aspects of alignment and translation performance. We finally discuss translation performance with large bitext training sets on the NIST 2004 Chinese-English and Arabic-English MT tasks. © 2005 Cambridge University Press.

关键词： computer aided language translation

来源：评论

学校读者我要写书评

暂无评论

A Dialectal Chinese speech Recognition Framework

引用

Journal of computer science & Technology 2006年第1期21卷 106-115页

作者：李净郑方 William Byrne Dan Jurafsky Center for Speech Technology State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University Beijing 100084 P.R. China Machine Intelligence Laboratory Cambridge University U.K. Center for Language and Speech Processing The Johns Hopkins University U.S.A. Department of Linguistics Stanford University U.S.A.

A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese （or in other words Chinese influenced by the native dialect） speech corpus and dialect-related knowledge are adopted to transform a standard Chinese （or Putonghua, abbreviated as PTH） speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored： one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels： phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese （WDC） as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use contextindependent PTH-IF mappings （where IF means either a Chinese Initial or a Chinese Final）, context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings （obtained from either experts or data）, and combine them with the supervised maximum likelihood linear regression （MLLR） acoustic model adaptation method. To reduce the size of the multipronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion （MPE） method based on the accumulated uni-gram probability （AUP） is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate （CER） reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and

关键词： dialectal Chinese speech recognition initial or final （IF） IF-mapping rule pronunciation modeling small quantity of speech data

来源：评论

学校读者我要写书评

暂无评论

Better informed training of latent syntactic features

Better informed training of latent syntactic features

引用

11th Conference on Empirical Methods in Natural language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006

作者： Dreyer, Markus Eisner, Jason Department of Computer Science Center for Language and Speech Processing Johns Hopkins University 3400 North Charles Street Baltimore MD 21218 United States

ISBN: (纸本)1932432736

We study unsupervised methods for learning refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example split NP without supervision into NP[0] and NP[1], which behave differently. We first propose to learn a PCFG that adds such features to nonterminals in such a way that they respect patterns of linguistic feature passing: each node's nonterminal features are either identical to, or independent of, those of its parent. This linguistic constraint reduces runtime and the number of parameters to be learned. However, it did not yield improvements when training on the Penn Treebank. An orthogonal strategy was more successful: to improve the performance of the EM learner by treebank preprocessing and by annealing methods that split nonterminals selectively. Using these methods, we can maintain high parsing accuracy while dramatically reducing the model size. © 2006 Association for Computational Linguistics.

关键词： Forestry

来源：评论

学校读者我要写书评

暂无评论

Learning algorithms for online principal-agent problems (and selling goods online) 06

Learning algorithms for online principal-agent problems (and...

引用

ICML 2006: 23rd International Conference on Machine Learning

作者： Conitzer, Vincent Garera, Nikesh Computer Science Department Carnegie Mellon University Pittsburgh PA 15213 United States Department of Computer Science Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 United States

ISBN: (纸本)1595933832

In a principal-agent problem, a principal seeks to motivate an agent to take a certain action beneficial to the principal, while spending as little as possible on the reward. This is complicated by the fact that the principal does not know the agent's utility function (or type). We study the online setting where at each round, the principal encounters a new agent, and the principal sets the rewards anew. At the end of each round, the principal only finds out the action that the agent took, but not his type. The principal must learn how to set the rewards optimally. We show that this setting generalizes the setting of selling a digital good online. We study and experimentally compare three main approaches to this problem. First, we show how to apply a standard bandit algorithm to this setting. Second, for the case where the distribution of agent types is fixed (but unknown to the principal), we introduce a new gradient ascent algorithm. Third, for the case where the distribution of agents' types is fixed, and the principal has a prior belief (distribution) over a limited class of type distributions, we study a Bayesian approach.

关键词： Learning algorithms

来源：评论

学校读者我要写书评

暂无评论

A bilingual corpus of novels aligned at paragraph level

引用

5th International Conference on NLP, FinTAL 2006

作者： Gelbukh, Alexander Sidorov, Grigori Vera-Félix, José Ángel Natural Language and Text Processing Laboratory Center for Research in Computer Science National Polytechnic Institute Av. Juan Dios Batiz s/n Zacatenco 07738 Mexico City Mexico

ISBN: (纸本)3540373349

The paper presents a bilingual English-Spanish parallel corpus aligned at the paragraph level. The corpus consists of twelve large novels found in Internet and converted into text format with manual correction of formatting problems and errors. We used a dictionary-based algorithm for automatic alignment of the corpus. Evaluation of the results of alignment is given. There are very few available resources as far as parallel fiction texts are concerned, while they are non-trivial case of alignment of a considerable size. Usually, approaches for automatic alignment that are based on linguistic data are applied for texts in the restricted areas, like laws, manuals, etc. It is not obvious that these methods are. applicable for fiction texts because these texts have much more cases of non-literal translation than the texts in the restricted areas. We show that the results of alignment for fiction texts using dictionary based method are good, namely, produce state of art precision value. © Springer-Verlag Berlin Heidelberg 2006.

关键词： Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：