检索结果-内蒙古大学图书馆

Inferring the Location of an Event in Large Corpus

International Journal of Computer processing of languages 2011年第3期23卷 255-271页

作者： HANJING LI RUNZHI DONG TIEJUN ZHAO MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin Heilongjiang 150001 China Special Education School Beijing Union University Beijing 100075 China

The location of a passage is a kind of semantic information that may prove useful for a variety of applications dealing with inference over passages described in natural language texts. In this paper, we propose a method for automatic discovery of pairs of an event and a place term related by the location, such as wash clothes ⇒ laundry room. In contrast to previous approaches that extracting associations between particular actions and locations when those actions occur on dunning likelihood ratio, the underlying assumption of our method is that correlation between an event and a place term is in the regular co-occurrence of a verb and a place noun within locally coherent text. Based on the analogy with the problem of inferring semantic information from text corpus statistical method of Dunning's likelihood ratio is used to score the extracted pairs for association in order to discover the correlation in each pair. In our experimental evaluation, we examine the effect that various statistical methods produce on the accuracy of this model of inferring locations. After that we carried out a direct evaluation of rare pairs against different statistical method.

关键词： Semantic information inferring likelihood ratio rare event word association

来源：评论

学校读者我要写书评

暂无评论

A Feasible Process For Mining Corpus From Web

A Feasible Process For Mining Corpus From Web

引用

International Conference on Electronic and Mechanical Engineering and Information technology (EMEIT)

作者： Chao Wang Dequan Zheng Tiejun Zhao Ji Guo MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China NLP Group Aliyun Inc. Beijing China

Mining bilingual parallel sentence pair from Web data is the most effective way to get large-scale of bilingual corpus. In this paper, we put forward both the set of method and the series of process for extracting parallel sentence pair from nonspecific web date source. considering 1.1 billion page as the web data input, with a sequence of steps we get several sentences pair which has 81% recall and 85% precision, on this basis we bring up a parameter for measure quality of sentence pair. After filter sentence pair by this parameter, we get 850 thousand unique sentence pairs. On filtering by this parameter, the precision increase to 95%, meanwhile the recall only decrease by 1%.

关键词： Web pages Data mining Dictionaries Accuracy Patents Radio access networks HTML

来源：评论

学校读者我要写书评

暂无评论

A study of features on primary question detection in chinese online forums

A study of features on primary question detection in chinese...

引用

2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010

作者： Sun, Lin Liu, Bingquan Wang, Baoxun Zhang, Deyuan Wang, Xiaolong MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

ISBN: (纸本)9781424459346

Primary Question detection in online forum is a subtask of extracting question-answer pairs. In this paper, by surveying the forms of questions in Chinese online forums, a combination of textual and N-gram features achieved via feature selection is adopted to help detecting primary questions. By viewing primary question detection a binary classification problem, decision tree classifier C4.5 and support vector machine are introduced to distinguish questions from non-questions separately. Experimental results across multiple datasets demonstrate that the mixture of textual and N-gram features performs better than using each of them separately under both C4.5 and support vector machine. By computing the weight of each feature in the two classifiers, the top 6 features are found the very same except for a little adjustment of order, showing that the combination of textual and N-gram features is universal and effective in detecting primary questions. ©2010 IEEE.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

A clustering based fast detection algorithm for large scale duplicate emails

A clustering based fast detection algorithm for large scale ...

引用

International Conference on Machine Learning and Cybernetics

作者： Sun, Lin Liu, Bing-Quan Wang, Bao-Xun Wang, Xiao-Long MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

ISBN: (纸本)9781424465262

Duplicate emails, which exist on the internet widely and are mainly caused by mailing lists, not only waste storage resource but also bring users garbage. In this paper, according to the structure and text feature of email, we put forward the concept of Mail-Duplicate-Degree, and in this way the email duplicate is firstly defined. Based on this definition, we develop an algorithm based on clustering to detect duplicate emails. By introducing a hash function provided by TRIE tree to optimize the efficiency, the algorithm gets over the slow processing speed problem existing in traditional clustering methods. Experimental results on large-scale emails have shown that the algorithm has a high precision. © 2010 IEEE.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

Event entailment extraction based on EM iteration

Event entailment extraction based on EM iteration

引用

International Conference on Asian language processing

作者： Li, Zhen Li, Hanjing Yu, Mo Zhao, Tiejun Li, Sheng MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

ISBN: (纸本)9780769542881

In the research and development of various natural language processing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the performance of the system. Systems with text entailment knowledge will be smarter than those who without entailment knowledge. Currently many research teams are focusing on text entailment, including recognition, generation and extraction. However, entailment extraction is the main method in creating entailment knowledge database. Meanwhile, for text-to-scene conversation system, due to the importance of events in stories, find a method for event entailment extraction is our main goal. In this paper, we proposed a method for extracting event entailment from corpus based on EM iteration, which has not been used before. © 2010 IEEE.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

Research on domain-adaptive transfer learning method and its applications

Research on domain-adaptive transfer learning method and its...

引用

International Conference on Asian language processing

作者： Fei, Geli Zheng, Dequan MOE-Microsoft Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

ISBN: (纸本)9780769542881

Traditional machine learning methods rely on strong assumptions, especially assuming that training data and testing data in homogeneous feature spaces. However, this is not always true in reality. To break such assumptions, this paper proposes a domain-adaptive transfer learning method, which automatically learns knowledge from existing knowledge bank by extracting linguistic information such as part-of-speech and co-occurrence of keywords and constructing a new domain-adaptive transfer knowledge bank. Through experiments on homogeneous and heterogeneous feature spaces, we testify the efficacy of our methods. © 2010 IEEE.

关键词： Learning systems

来源：评论

学校读者我要写书评

暂无评论

An information extraction system for heterogeneous Web source

An information extraction system for heterogeneous Web sourc...

引用

International Conference on Machine Learning and Cybernetics

作者： Zhou, Ting Sun, Cheng-Jie Lin, Lei Liu, Bing-Quan MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology School of Computer Science and Technology Harbin 150001 China

ISBN: (纸本)9781424465262

Information Extraction is the task of identifying information in texts and converting it into a predefined format. In this paper, we build an information integration system which focuses on the information of computer science teachers in Chinese universities. The target of the system is to automatically extract the useful information from heterogeneous sources and re-organize them into structured format. The system includes 4 main modules: web pages retrieval module, web pages' structure classification module, information extraction module and information updating module. We have successfully applied the system to deal with 107 universities in China which shows the effect of the proposed system. © 2010 IEEE.

关键词： Websites

来源：评论

学校读者我要写书评

暂无评论

Event entailment chains extraction for Text-to-Scene conversion

Event entailment chains extraction for Text-to-Scene convers...

引用

International Conference on Machine Learning and Cybernetics

作者： Li, Han-Jing Li, Zhen Xue, Xiao-Ping Zhao, Tie-Jun Department of Mathematics Harbin Institute of Technology Harbin 150001 China MOE-MS Key Lab of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

ISBN: (纸本)9781424465262

In our study of Text-to-Scene conversation (TTS), which translates natural language into animations automatically, we realized that event entailment knowledge is useful in generating scenes since the main part of a scene is to show an event. In this paper, we provide some results of our attempt to extract event entailment knowledge. We use entailment chains instead of traditional entailment rules since the sequence of events is a process which make useful in TTS. The result shows that the work is worth to continue to study. © 2010 IEEE.

关键词： Chains

来源：评论

学校读者我要写书评

暂无评论

Semi-supervised domain adaptation for WSD: Using a word-by-word model selection approach

Semi-supervised domain adaptation for WSD: Using a word-by-w...

引用

IEEE International Conference on Cognitive Informatics

作者： Guo, Yuhang Che, Wanxiang Liu, Ting Li, Sheng MOE-Microsoft Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology School of Computer Science and Technology 150001 China

ISBN: (纸本)9781424480401

This paper proposes a word-by-word model selection approach to domain adaptation for Word Sense Disambiguation. By this approach, the model for a target word is automatically selected from a candidate model set, which is comprised of improved self-training models and a supervised model. The improved self-training uses sense priors to prevent its iteration from converging into undesirable states. Experimental results on a domain-specific corpus show that: (1) our improved self-training model is effective for the words which have target domain linked senses;(2) the selected models obtain higher accuracies than each single model and effectively improve the performance compared to the state-of-the-art supervised model. © 2010 IEEE.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Research on automatic pattern acquisition based on construction extension

引用

Journal of Convergence Information technology 2010年第1期5卷 122-127页

作者： Chen, Yu Zheng, Dequan Zheng, Bowen Zhao, Tiejun MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

Although entities are named under some specific rules, the amount of various names makes it impossible for computers to detect these entities in a context because of the complex variety of the rules. If we can create a rule that can be easily identified by computers to detect these names automatically, it will substantially reduce our cost, save our time as well as improve extraction efficiency. Therefore, this paper is intended to discuss these specific naming rules for the entities and to assign the methods into computers in order for them to automatically obtain new patterns of term. These methods are represented by pos tags and indicated term. One method is based on soft match. The other method is based on constituent extension. The constituent extension method recognizes new patterns according to the rules and logic among each entity's constituents. This means that each pattern can be extended and assembled logically. The patterns produced in this way would be the accurate patterns. The result of the experiment based on this method proves that the automatic new patterns recognition increases the efficiency of entity extraction.

关键词： Pattern matching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：