In this paper, we formalize the task of finding a knowledge base entry that a given named entity mention refers to, namely entity linking, by identifying the most "important" node among the graph nodes repre...
详细信息
this paper proposes a novel method to resolve the coverage problem of SMT system. the method generates paraphrases for source-side sentences of the bilingual parallel data, which are then paired withthe target-side s...
详细信息
the direct optimization of a translation metric is an integral part of building state-of-the-art SMT systems. Unfortunately, widely used translation metrics such as BLEU-score are non-smooth, non-convex, and non-trivi...
详细信息
Word Sense Disambiguation (WSD) is one of the fundamental natural language processing tasks. However, lack of training corpora is a bottleneck to construct a high accurate all-words WSD system. Annotating a large-scal...
详细信息
Word alignment is a fundamental step in machine translation. Current statistical machine translation systems suffer from a major drawback: they only extract rules from 1-best alignments, which adversely affects the ru...
详细信息
Word lists have become available for most of the world's languages, but only a small fraction of such lists contain cognate information. We present a machine-learning approach that automatically clusters words in ...
详细信息
Annotating Named Entity Recognition (NER) training corpora is a costly process but necessary for supervised NER systems. this paper presents an approach to generate large-scale Chinese NER training data from an Englis...
详细信息
Measuring the similarity for categorical data is a challenging task in data mining due to the poor structure of categorical data. this paper presents a dissimilarity measure for categorical data based on the relations...
详细信息
Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical infor...
详细信息
暂无评论