Most example-based machine translation (EBMT) systems handle their translation examples using some heuristic measures based on human intuition. However, these heuristic rules are usually hard to be effectively organiz...
详细信息
Most example-based machine translation (EBMT) systems handle their translation examples using some heuristic measures based on human intuition. However, these heuristic rules are usually hard to be effectively organized to scale to incorporate diverse features to cover more language phenomenon and large domains. In this paper, we use machine learning approach for EBMT model design instead of human intuition. Maximum entropy (ME) model is introduced in order to adequately incorporate different kinds of features inherited in the translation examples effectively. At the same time, a multi-dimensional feature space is formally constructed to include various features of different aspects. In the experiments, the proposed model shows significant performance improvement.
We propose a novel syntax-based model for statistical machine translation in which meta-structure (ms) and meta-structure sequence (Sms) of a parse tree are defined. In this framework, a parse tree is decomposed into ...
We propose a novel syntax-based model for statistical machine translation in which meta-structure (ms) and meta-structure sequence (Sms) of a parse tree are defined. In this framework, a parse tree is decomposed into Sms to deal with the structure divergence and the alignment can be reconstructed at different levels of recombination of ms (RM). RM pairs extracted can perform the mapping between the sub-structures across languages. As a result, we have got not only the translation for the target language, but an Sms of its parse tree at the same time. Experiments with BLEU metric show that the model significantly outperforms Pharaoh, a state-art-the-art phrase-based system.
Automatic terminology extraction requires termhood verification for extracted terms in a specific domain. Chinese terminology extraction suffers from insufficient domain corpora for verification even though there is a...
详细信息
Automatic terminology extraction requires termhood verification for extracted terms in a specific domain. Chinese terminology extraction suffers from insufficient domain corpora for verification even though there is abundance of information in other languages. This paper presents a novel approach to overcome this problem by using word translations and bilingual web resources to improve both coverage and precision. The proposed approach incorporates bilingual information from within candidate terms themselves and from existing domain knowledge to conduct termhood calculation. In contrast to previous researches, this method is not confined to only pre-determined corpora. Preliminary experiments show a 14.8% improvement in coverage and 26.3% improvement in precision, respectively.
The paper presents some main progresses and achievements in Chinese information processing. It focuses on six aspects, i.e., Chinese syntactic analysis, Chinese semantic analysis, machine translation, information retr...
详细信息
The paper presents some main progresses and achievements in Chinese information processing. It focuses on six aspects, i.e., Chinese syntactic analysis, Chinese semantic analysis, machine translation, information retrieval, information extraction, and speech recognition and synthesis. The important techniques and possible key problems of the respective branch in the near future are discussed as well.
The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use o...
详细信息
For information retrieval, users hope to acquire more relevant information from the top indexing documents. In this paper, a combination of Ontology with statistical method is presented to retrieval initial document s...
详细信息
For information retrieval, users hope to acquire more relevant information from the top indexing documents. In this paper, a combination of Ontology with statistical method is presented to retrieval initial document set and improve the precision of top N ranking documents by re-ranking document set. The experiment with NTCIR-3 Chinese CLIR dataset shows the proposed method improved the precision of information retrieval.
For information retrieval, users hope to acquire more relevant information from the top N ranking documents. In this paper, a hybrid Chinese language model is presented, which is defined as a combination of ontology w...
详细信息
For information retrieval, users hope to acquire more relevant information from the top N ranking documents. In this paper, a hybrid Chinese language model is presented, which is defined as a combination of ontology with statistical method, to improve the precision of top N ranking documents by reordering the initial retrieval documents. The experiment with NTCIR-3 formal Chinese test collection shows the proposed method improved the precision at top N ranking documents level
For improving the effectiveness of cross-lingual information retrieval (CLIR), a domain ontology knowledge based method is presented to apply to C-E CLIR. In this study, the domain ontology knowledge is acquired from ...
详细信息
ISBN:
(纸本)1424406048
For improving the effectiveness of cross-lingual information retrieval (CLIR), a domain ontology knowledge based method is presented to apply to C-E CLIR. In this study, the domain ontology knowledge is acquired from both source language user queries and target documents to select target translation and re-rank initial retrieval documents set. The C-E CLIR dataset from NTCIR-4 Workshop is used to evaluate the effectiveness of this method. Different from previous works, we make use of source language user queries in total C-E CLIR and compared with previous works, this method improved the precision
Text classification is becoming one of the key techniques in organizing and handling a large amount of text data. In this paper, a combination of ontology with statistical method is presented to improve the precision ...
详细信息
Text classification is becoming one of the key techniques in organizing and handling a large amount of text data. In this paper, a combination of ontology with statistical method is presented to improve the precision of text classification. In this study, first, different kind of linguistic ontology knowledge will be respectively acquired by learning training corpus to determine text classifiers. For an actual document,the semantic evaluation value of the document will respectively be gotten by different kind of linguistic ontology knowledge and the categories will be judged by the highest evaluation value. Compared with Bayes, k-nearest neighbor and support vector machine, the proposed approach outperforms previous works.
暂无评论