检索结果-内蒙古大学图书馆

A feature combination method for semantic role classification

Journal of Information and Computational Science 2010年第1期7卷 127-133页

作者： Li, Shiqi Zhao, Tiejun Li, Hanjing Liu, Shui MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

This paper proposes a Chinese semantic role classification approach on the basis of feature combination. First we define a set of effective basic features. Then a statistics-based feature combination method is developed for constructing a combined feature set from the basic feature set. According to the distribution of each combined feature in positive and negative instances, we utilize a new statistics to measure the classifying performance of the feature, and then put the features with high ratio in the combined feature set. Finally, the semantic roles are classified by SVM classifier using both basic and combined feature set. The experimental results on Chinese Proposition Bank show that the method can improve the performance of SRC by two percent. Copyright © 2010 Binary Information Press.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

A clustering based fast detection algorithm for large scale duplicate emails

A clustering based fast detection algorithm for large scale ...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Lin Sun Bing-Quan Liu Bao-Xun Wang Xiao-Long Wang MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

Duplicate emails, which exist on the internet widely and are mainly caused by mailing lists, not only waste storage resource but also bring users garbage. In this paper, according to the structure and text feature of email, we put forward the concept of Mail-Duplicate-Degree, and in this way the email duplicate is firstly defined. Based on this definition, we develop an algorithm based on clustering to detect duplicate emails. By introducing a hash function provided by TRIE tree to optimize the efficiency, the algorithm gets over the slow processing speed problem existing in traditional clustering methods. Experimental results on large-scale emails have shown that the algorithm has a high precision.

关键词： Electronic mail Clustering algorithms Internet Noise Algorithm design and analysis Feature extraction Layout

来源：评论

学校读者我要写书评

暂无评论

Event Entailment Extraction Based on EM Iteration

Event Entailment Extraction Based on EM Iteration

引用

International Conference on Asian language processing (IALP)

作者： Zhen Li Hanjing Li Mo Yu Tiejun Zhao Sheng Li MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

In the research and development of various natural language processing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the performance of the system. Systems with text entailment knowledge will be smarter than those who without entailment knowledge. Currently many research teams are focusing on text entailment, including recognition, generation and extraction. However, entailment extraction is the main method in creating entailment knowledge database. Meanwhile, for text-to-scene conversation system, due to the importance of events in stories, find a method for event entailment extraction is our main goal. In this paper, we proposed a method for extracting event entailment from corpus based on EM iteration, which has not been used before.

关键词： Context Internet Animation Natural language processing Libraries Classification algorithms Encyclopedias

来源：评论

学校读者我要写书评

暂无评论

Study on the optimal parameters of image fusion based on wavelet transform

引用

Journal of Computational Information Systems 2010年第1期6卷 131-137页

作者： Zheng, Hong Zheng, Dequan Hu, Yanxiang Li, Sheng MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China Tianjin Normal University Tianjin 300387 China

Selection of wavelet type, decomposition level and fusing rule is a key problem when wavelet transform is applied to image fusion. 2916 kinds of different fusing methods(54×5×9, including 54 wavelet types, 5 decomposing levels, and 9 fusing rules)are analyzed and compared in the experiment of fusing multi-focus images in this paper. Through calculating the comparability degree of fused images, the fusion performances are evaluated. And the experiment shows that the similarities of the results and the ideal pictures are all over 0.999, showing pretty good performance. Copyright © 2010 Binary Information Press.

关键词： Image fusion

来源：评论

学校读者我要写书评

暂无评论

An information extraction system for heterogeneous Web source

An information extraction system for heterogeneous Web sourc...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Ting Zhou Cheng-Jie Sun Lei Lin Bing-Quan Liu MOE-MS Key Laboratory of Natural Language Processing and Speech School of Computer Science and Technology Harbin Institute of Technology Harbin China

Information Extraction is the task of identifying information in texts and converting it into a predefined format. In this paper, we build an information integration system which focuses on the information of computer science teachers in Chinese universities. The target of the system is to automatically extract the useful information from heterogeneous sources and re-organize them into structured format. The system includes 4 main modules: web pages retrieval module, web pages' structure classification module, information extraction module and information updating module. We have successfully applied the system to deal with 107 universities in China which shows the effect of the proposed system.

关键词： Web pages Data mining Classification algorithms Educational institutions Crawlers Support vector machines Search engines

来源：评论

学校读者我要写书评

暂无评论

Event entailment chains extraction for Text-to-Scene conversion

Event entailment chains extraction for Text-to-Scene convers...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Han-Jing Li Zhen Li Xiao-Ping Xue Tie-Jun Zhao MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China Department of mathematics Harbin Institute of Technology Harbin China

In our study of Text-to-Scene conversation (TTS), which translates natural language into animations automatically, we realized that event entailment knowledge is useful in generating scenes since the main part of a scene is to show an event. In this paper, we provide some results of our attempt to extract event entailment knowledge. We use entailment chains instead of traditional entailment rules since the sequence of events is a process which make useful in TTS. The result shows that the work is worth to continue to study.

关键词： Animation Machine learning Cybernetics Joining processes Generators Natural language processing

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT APPROACH TO COMMENT SPAM IDENTIFICATION

引用

Journal of Electronics(China) 2009年第5期26卷 644-650页

作者： Yang Yuhang Zhao Tiejun Zheng Dequan Yu Hao MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam identification. In practice, content repetitiveness is determined by the length and frequency of the longest common substring. Furthermore, text similarity is calculated using vector space model. The precisions of preliminary experiments on comment spam identification conducted on Chinese and English are as high as 93% and 82% respectively. The results show the validity and language independency of this approach. Compared with conventional spam filtering approaches, our method requires no training, no rule sets and no link relationships. The proposed approach can also deal with new comments as well as existing comments.

关键词： Comment spam Automatic identification Content analysis Blog

来源：评论

学校读者我要写书评

暂无评论

Automatic domain-ontology structure and example acquisition from semi-structured texts

Automatic domain-ontology structure and example acquisition ...

引用

6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009

作者： Xiao, Cheng Zheng, Dequan Yang, Yuhang MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

ISBN: (纸本)9780769537351

This paper presents a new method to acquire Domain-Ontology structure and examples from semi-structured data sources. Firstly, extract Domain-Ontology structure, including candidate attributes extraction using certain patterns and applying a statistic method to filter out the incorrect attributes. Secondly, using Domain-Ontology structure as a clue, automatically generate example extraction patterns. Finally, acquire Ontology examples taking advantage of the special structure feature of the Web pages. Experiments are carried out in the field of film, the precision of the Ontology structure extraction is 83.7%, and the highest recall of the examples extraction reaches 90%. Experimental results demonstrate that the method developed in this paper is fairly efficient. © 2009 IEEE.

关键词： Ontology

来源：评论

学校读者我要写书评

暂无评论

Study on image classification based on SVM and the fusion of multiple features 11th

Study on image classification based on SVM and the fusion of...

引用

ICEIS 2009 - 11th International Conference on Enterprise Information Systems

作者： Dequan, Zheng Tiejun, Zhao Sheng, Li Yufeng, Li MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

ISBN: (纸本)9789898111845

In this paper, an adaptive feature-weight adjusted image classification method is proposed, which is based on the SVM and the fusion of multiple features. Firstly, classifier was separately constructed for each image feature, then automatically learn the weight coefficient of each feature by training data set and the classifiers constructed. At last, a complexity classifier is created by combining the separate classifier and the corresponding weight coefficient. The experiment result showed that our scheme improved the performance of image classification and had adaptive ability comparing with general approach. Moreover, the scheme has certain robustness because of avoiding the impact brought by various dimension of each feature.

关键词： Image classification

来源：评论

学校读者我要写书评

暂无评论

Comparative study on hierarchical phrase structures and linguistic phrase structures

Comparative study on hierarchical phrase structures and ling...

引用

6th International Workshop on Natural language processing and Cognitive Science - NLPCS 2009 In Conjunction with ICEIS 2009

作者： Tiejun, Zhao Yongliang, Ma Dequan, Zheng Sheng, Li MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Nangang Xidazhijie 92 150001 Harbin China

ISBN: (纸本)9789898111920

This paper proposes a framework for analysis of SMT translations output from a hierarchical phrase decoder. The tree display tool will show the translation process of the SMT model. An interactive operation tool will provide an adjusting mechanism for translation quality improvement. The work will explore automatic or semi-automatic identification and correction of some translation errors based on comparison between hierarchical phrase structures and linguistic phrase structures. Parts of the framework are implemented and primary results introduced.

关键词： Linguistics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：