咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Combination Approaches in Kore... 收藏

Combination Approaches in Korean Information Retrieval: Words vs. n-grams, and Query Translation vs. Document Translation

作     者:IN-SU KANG SEUNG-HOON NA JONG-HYEOK LEE 

作者机构:Korea Institute of Science and Technology Information (KISTI) Korea Division of Electrical and Computer Engineering Pohang University of Science and Technology (POSTECH) Advanced Information Technology Research Center (AITrc) PIRL 323 Pohang University of Science and Technology San 31Hyoja-dong Nam-gu Pohang 790-784 Korea 

出 版 物:《International Journal of Computer Processing of Languages》 

年 卷 期:2006年第19卷第2N03期

页      面:153-187页

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:CJK information retrieval Fusion strategies Cross-language information retrieval Combination of query translation and document translation 

摘      要:Asian language information retrieval has been a challenge to existing information retrieval researchers, since Asian languages have different characteristics from Indo-European languages. For example, Chinese and Japanese lack word delimiters, and Korean does not allow spaces between words within its syntactic unit called Eojeol . In addition, they employ large sets of characters originating from ideographic Chinese characters. Although much research has been conducted on the above Asian languages in order to adapt or confirm existing information retrieval solutions that were developed primarily for English, there have been only a few Korean-related works reported internationally, and most of them have been done on small-scale document collections. Thus, this study presents large-scale retrieval evaluations on Korean to serve as a benchmark for further Korean-related information retrieval researches. In particular, this article investigates the following issues regarding Korean: word-based retrieval vs. n-gram-based retrieval, and query translation vs. document translation. Our monolingual experiments confirmed that, in Korean, n-gram-based and word-based retrieval show different retrieval characteristics for many queries, and that their fusion achieves better performance than either one alone in the case of the probabilistic model. The same was witnessed on query translation and document translation from cross-lingual experiments. In addition, we observed that naive document translation performs slightly better than naive query translation since the former performs query structuring similar to the Pirkola method.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分