检索结果-内蒙古大学图书馆

Recent advances on NLP research in harbin institute of technology

中国高等学校学术文摘·计算机科学 2007年第4期1卷 413-428页

作者： ZHAO Tiejun GUAN Yi LIU Ting WANG Qiang MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of TechnologyHarbin 150001China

In the 1960s,the researchers of harbin institute of technology(HIT)attempted to do relevant research on natural language *** more than 40-year's effort,HIT has already established three research laboratories for Chinese information processing,*** Machine Intelligence and Translation laboratory(MI&T lab),the Intelligent technology and Natural language processing laboratory(ITNLP)and the Information Retrieval laboratory (IR-lab).At present,it has a well-balanced research team of over 200 persons,and the research interests have extended to language processing,machine translation,text retrieval and other *** institute of technology has accumulated a batch of key techniques and data resources,won many prizes in the technical evaluations at home and *** institute of technology has become one of the most important natural language processing bases for teaching and scientific research in China *** paper gives an introduction to the achievements onNLP in HIT.

关键词： natural language processing harbin institute of technology text analysis machine translation

来源：评论

学校读者我要写书评

暂无评论

Chinese Information processing and Its Prospects

引用

Journal of Computer Science & technology 2006年第5期21卷 838-846页

作者：李生赵铁军 MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 P.R. China

The paper presents some main progresses and achievements in Chinese information processing. It focuses on six aspects, i.e., Chinese syntactic analysis, Chinese semantic analysis, machine translation, information retrieval, information extraction, and speech recognition and synthesis. The important techniques and possible key problems of the respective branch in the near future are discussed as well.

关键词： Chinese information processing natural language processing computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Query Expansion with Statistical Machine Translation

引用

电子学报(英文版) 2008年第1期17卷 48-52页

作者： LI Weijiang ZHAO Tiejun WANG Xiangang MOE-MS Key Laboratory of Natural Language Processing and Speech School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China

In practical applications of information retrieval, such as the search engine, the query user submitted contains only several keywords usually. This will cause unmatched issues of words between relevant files and the user's query, and result in more seriously negative effects on the performance of information retrieval. On the basis of analyzing the process of producing query, this paper puts forward a new method of query expansion based on the model of statistical machine translation. The approach extract related terms between documents and query through statistical machine translation model, then expand the query with them. The experiment on TREC data collection shows that our method achieved 4-17% of the improvement all the time more than the language model method without expanding. Compared to pseudo feedback, our method has the competitive average precision.

关键词：信息检索机器翻译查询扩展语言模型

来源：评论

学校读者我要写书评

暂无评论

RM-structure alignment based statistical machine translation model

引用

High technology Letters 2008年第3期14卷 271-275页

作者：孙加东 Zhao Tiejun MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 P.R. China

A novel model based on structure alignments is proposed for statistical machine translation in this paper. Meta-structure and sequence of meta-structure for a parse tree are defined. During the translation process, a parse tree is decomposed to deal with the structure divergence and the alignments can be constructed at different levels of recombination of meta-structure （RM）. This method can perform the structure mapping across the sub-tree structure between languages. As a result, we get not only the translation for the target language, but sequence of meta-stmctu .re of its parse tree at the same time. Experiments show that the model in the framework of log-linear model has better generative ability and significantly outperforms Pharaoh, a phrase-based system.

关键词： statistical machine translation recombination of meta-structure （ RM） structure alignment log-linear model

来源：评论

学校读者我要写书评

暂无评论

Two-stage approach to full Chinese parsing

引用

High technology Letters 2005年第4期11卷 359-363页

作者：曹海龙 Zhao Tiejun Yang Muyun Li Sheng MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 P.R. China

Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.

关键词： natural language processing systems parsing markov model pattern recognition

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT APPROACH TO COMMENT SPAM IDENTIFICATION

引用

Journal of Electronics(China) 2009年第5期26卷 644-650页

作者： Yang Yuhang Zhao Tiejun Zheng Dequan Yu Hao MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam identification. In practice, content repetitiveness is determined by the length and frequency of the longest common substring. Furthermore, text similarity is calculated using vector space model. The precisions of preliminary experiments on comment spam identification conducted on Chinese and English are as high as 93% and 82% respectively. The results show the validity and language independency of this approach. Compared with conventional spam filtering approaches, our method requires no training, no rule sets and no link relationships. The proposed approach can also deal with new comments as well as existing comments.

关键词： Comment spam Automatic identification Content analysis Blog

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT APPROACH TO IMPORTANT BLOGGERS DISCOVERY

引用

Journal of Electronics(China) 2008年第2期25卷 218-225页

作者： Yang Yuhang Yu Hao Zhao Tiejun Tan Hongye Zheng Dequan MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin 150001 China

Popularity of blogs and the amount of information in the blogosphere increase so fast that it is difficult for Internet users to search the information they care about. Compared with conventional webs,links in the blogosphere are more abundant and conversations between bloggers are more fre-quent. This paper proposes a method of ranking bloggers based on link analysis,which can exemplify the characteristics of blogs,and reduce the influence of link spamming. This method can also bring convenience to users to read blogs,and it can supply a new methodology for information retrieval in the blogosphere. To ensure the reliability of the ranking results,some evaluation indicators of the im-portant bloggers are proposed,and the grading results of bloggers using the proposed method is compared with that using other indicators. At last,correlation analysis is used to verify the consistency between the proposed method and the evaluation indicators.

关键词： Important blogger Link analysis Evaluation indicator Correlation analysis

来源：评论

学校读者我要写书评

暂无评论

Event recognition based on time series characteristics

Event recognition based on time series characteristics

引用

2011 8th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2011, Jointly with the 2011 7th International Conference on Natural Computation, ICNC'11

作者： Li, Fenghuan Zheng, Dequan Zhao, Tiejun MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

ISBN: (纸本)9781612841816

Event recognition and temporal information analysis are important subtasks in information extraction (IE). In this paper, event recognition based on time series characteristics is proposed. In the pipeline of event recognition, trigger word table is extracted from training corpus and extended based on the field and thesaurus, which is regarded as a priori knowledge. Then event recognition is carried out using trigger words and support vector machine (SVM). Temporal expressions are normalized primarily when recognizing event time. Especially, keywords on time and their priorities are taken into account. Finally, events are sorted by time series characteristics. The results show that methods proposed in this paper are valid and effective. © 2011 IEEE.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Research on text categorization based on a weakly-supervised transfer learning method

Research on text categorization based on a weakly-supervised...

引用

13th Annual Conference on Intelligent Text processing and Computational Linguistics, CICLing 2012

作者： Zheng, Dequan Zhang, Chenghe Fei, Geli Zhao, Tiejun MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

ISBN: (纸本)9783642286001

This paper presents a weakly-supervised transfer learning based text categorization method, which does not need to tag new training documents when facing classification tasks in new area. Instead, we can take use of the already tagged documents in other domains to accomplish the automatic categorization task. By extracting linguistic information such as part-of-speech, semantic, co-occurrence of keywords, we construct a domain-adaptive transfer knowledge base. Relation experiments show that, the presented method improved the performance of text categorization on traditional corpus, and our results were only about 5% lower than the baseline on cross-domain classification tasks. And thus we demonstrate the effectiveness of our method. © 2012 Springer-Verlag.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Search results optimization method combined with multi-features

Search results optimization method combined with multi-featu...

引用

2011 8th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2011, Jointly with the 2011 7th International Conference on Natural Computation, ICNC'11

作者： Qin, Yanxia Zheng, Dequan Xu, Bing MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin China

ISBN: (纸本)9781612841816

The optimization of search results has always been the research hotspot in the area of search engine. More concretely, topic partition by clustering proved to be a good way. However, the clusters, some of which still contain a lot of documents, implicitly limit the users' retrieval speed. Meanwhile we find that the information of documents' features have good effects on the document ranking. To address the issue, we try to apply the multi-features to search results after the process of clustering. Statistic and semantic information of the multi-features are fully used to re-rank the documents. Related experiments show that our approach outperforms that of single clustering much. The evaluation indicators' rising shows that the Top N results satisfy the users' need more. © 2011 IEEE.

关键词： Search engines

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：