Duplicate emails, which exist on the internet widely and are mainly caused by mailing lists, not only waste storage resource but also bring users garbage. In this paper, according to the structure and text feature of ...
详细信息
In the research and development of various naturallanguageprocessing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the perf...
详细信息
Information Extraction is the task of identifying information in texts and converting it into a predefined format. In this paper, we build an information integration system which focuses on the information of computer...
详细信息
In our study of Text-to-Scene conversation (TTS), which translates naturallanguage into animations automatically, we realized that event entailment knowledge is useful in generating scenes since the main part of a sc...
详细信息
Although entities are named under some specific rules, the amount of various names makes it impossible for computers to detect these entities in a context because of the complex variety of the rules. If we can create ...
详细信息
Although entities are named under some specific rules, the amount of various names makes it impossible for computers to detect these entities in a context because of the complex variety of the rules. If we can create a rule that can be easily identified by computers to detect these names automatically, it will substantially reduce our cost, save our time as well as improve extraction efficiency. Therefore, this paper is intended to discuss these specific naming rules for the entities and to assign the methods into computers in order for them to automatically obtain new patterns of term. These methods are represented by pos tags and indicated term. One method is based on soft match. The other method is based on constituent extension. The constituent extension method recognizes new patterns according to the rules and logic among each entity's constituents. This means that each pattern can be extended and assembled logically. The patterns produced in this way would be the accurate patterns. The result of the experiment based on this method proves that the automatic new patterns recognition increases the efficiency of entity extraction.
This paper proposes a Chinese semantic role classification approach on the basis of feature combination. First we define a set of effective basic features. Then a statistics-based feature combination method is develop...
详细信息
Duplicate emails, which exist on the internet widely and are mainly caused by mailing lists, not only waste storage resource but also bring users garbage. In this paper, according to the structure and text feature of ...
详细信息
Duplicate emails, which exist on the internet widely and are mainly caused by mailing lists, not only waste storage resource but also bring users garbage. In this paper, according to the structure and text feature of email, we put forward the concept of Mail-Duplicate-Degree, and in this way the email duplicate is firstly defined. Based on this definition, we develop an algorithm based on clustering to detect duplicate emails. By introducing a hash function provided by TRIE tree to optimize the efficiency, the algorithm gets over the slow processing speed problem existing in traditional clustering methods. Experimental results on large-scale emails have shown that the algorithm has a high precision.
In the research and development of various naturallanguageprocessing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the perf...
详细信息
In the research and development of various naturallanguageprocessing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the performance of the system. Systems with text entailment knowledge will be smarter than those who without entailment knowledge. Currently many research teams are focusing on text entailment, including recognition, generation and extraction. However, entailment extraction is the main method in creating entailment knowledge database. Meanwhile, for text-to-scene conversation system, due to the importance of events in stories, find a method for event entailment extraction is our main goal. In this paper, we proposed a method for extracting event entailment from corpus based on EM iteration, which has not been used before.
Selection of wavelet type, decomposition level and fusing rule is a key problem when wavelet transform is applied to image fusion. 2916 kinds of different fusing methods(54×5×9, including 54 wavelet types, 5...
详细信息
Information Extraction is the task of identifying information in texts and converting it into a predefined format. In this paper, we build an information integration system which focuses on the information of computer...
详细信息
Information Extraction is the task of identifying information in texts and converting it into a predefined format. In this paper, we build an information integration system which focuses on the information of computer science teachers in Chinese universities. The target of the system is to automatically extract the useful information from heterogeneous sources and re-organize them into structured format. The system includes 4 main modules: web pages retrieval module, web pages' structure classification module, information extraction module and information updating module. We have successfully applied the system to deal with 107 universities in China which shows the effect of the proposed system.
暂无评论