The location of a passage is a kind of semantic information that may prove useful for a variety of applications dealing with inference over passages described in naturallanguage texts. In this paper, we propose a met...
详细信息
The location of a passage is a kind of semantic information that may prove useful for a variety of applications dealing with inference over passages described in naturallanguage texts. In this paper, we propose a method for automatic discovery of pairs of an event and a place term related by the location, such as wash clothes ⇒ laundry room. In contrast to previous approaches that extracting associations between particular actions and locations when those actions occur on dunning likelihood ratio, the underlying assumption of our method is that correlation between an event and a place term is in the regular co-occurrence of a verb and a place noun within locally coherent text. Based on the analogy with the problem of inferring semantic information from text corpus statistical method of Dunning's likelihood ratio is used to score the extracted pairs for association in order to discover the correlation in each pair. In our experimental evaluation, we examine the effect that various statistical methods produce on the accuracy of this model of inferring locations. After that we carried out a direct evaluation of rare pairs against different statistical method.
Mining bilingual parallel sentence pair from Web data is the most effective way to get large-scale of bilingual corpus. In this paper, we put forward both the set of method and the series of process for extracting par...
详细信息
Mining bilingual parallel sentence pair from Web data is the most effective way to get large-scale of bilingual corpus. In this paper, we put forward both the set of method and the series of process for extracting parallel sentence pair from nonspecific web date source. considering 1.1 billion page as the web data input, with a sequence of steps we get several sentences pair which has 81% recall and 85% precision, on this basis we bring up a parameter for measure quality of sentence pair. After filter sentence pair by this parameter, we get 850 thousand unique sentence pairs. On filtering by this parameter, the precision increase to 95%, meanwhile the recall only decrease by 1%.
Primary Question detection in online forum is a subtask of extracting question-answer pairs. In this paper, by surveying the forms of questions in Chinese online forums, a combination of textual and N-gram features ac...
详细信息
Duplicate emails, which exist on the internet widely and are mainly caused by mailing lists, not only waste storage resource but also bring users garbage. In this paper, according to the structure and text feature of ...
详细信息
In the research and development of various naturallanguageprocessing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the perf...
详细信息
Traditional machine learning methods rely on strong assumptions, especially assuming that training data and testing data in homogeneous feature spaces. However, this is not always true in reality. To break such assump...
详细信息
Information Extraction is the task of identifying information in texts and converting it into a predefined format. In this paper, we build an information integration system which focuses on the information of computer...
详细信息
In our study of Text-to-Scene conversation (TTS), which translates naturallanguage into animations automatically, we realized that event entailment knowledge is useful in generating scenes since the main part of a sc...
详细信息
This paper proposes a word-by-word model selection approach to domain adaptation for Word Sense Disambiguation. By this approach, the model for a target word is automatically selected from a candidate model set, which...
详细信息
Although entities are named under some specific rules, the amount of various names makes it impossible for computers to detect these entities in a context because of the complex variety of the rules. If we can create ...
详细信息
Although entities are named under some specific rules, the amount of various names makes it impossible for computers to detect these entities in a context because of the complex variety of the rules. If we can create a rule that can be easily identified by computers to detect these names automatically, it will substantially reduce our cost, save our time as well as improve extraction efficiency. Therefore, this paper is intended to discuss these specific naming rules for the entities and to assign the methods into computers in order for them to automatically obtain new patterns of term. These methods are represented by pos tags and indicated term. One method is based on soft match. The other method is based on constituent extension. The constituent extension method recognizes new patterns according to the rules and logic among each entity's constituents. This means that each pattern can be extended and assembled logically. The patterns produced in this way would be the accurate patterns. The result of the experiment based on this method proves that the automatic new patterns recognition increases the efficiency of entity extraction.
暂无评论