To enhance the effectiveness of Thai text retrieval system, we need a significant improvement in automatic indexing. This paper is intended to provide the application of statistical and naturallanguageprocessing tec...
详细信息
To enhance the effectiveness of Thai text retrieval system, we need a significant improvement in automatic indexing. This paper is intended to provide the application of statistical and naturallanguageprocessing techniques to obtain multilevel content identifiers: phrasal level, single term level and conceptual level. These multilevel indices will cover a very wide range of document retrieval without degradation of system performance. Automatic multilevel indexing for Thai text requires three processes: lexical token identification, phrase identification and extraction, and multilevel index generation. The results give a significant benefit both in precision and recall.
Thai text retrieval systems always involve documents that use loan words (borrowed from foreign language), especially in the area of science and engineering. This paper describes an algorithmic approach to backward ma...
详细信息
Thai text retrieval systems always involve documents that use loan words (borrowed from foreign language), especially in the area of science and engineering. This paper describes an algorithmic approach to backward machine transliteration aimed at improving the retrieval process. The approach consists of two main steps: identifying loan words and back transliterating. The model of backward transliteration works successfully at approximately 95% including phonetic equivalent.
暂无评论