Traditional GEP algorithm takes up many system resources in decoding and evaluating due to the operation of the tree construction and corresponding traversing. This paper aims to introduce a novel GEP algorithm to all...
详细信息
Knowing semantic links among documents is the basis for intelligent applications over large-scale document resources. Discovering these semantic links with little human interference is a challenge issue. This paper pr...
详细信息
Focused crawlers selectively retrieve Web documents that are relevant to a predefined set of topics. To intelligently make predictions and decisions about relevant URLs and web pages, different topic models have been ...
详细信息
Bracketing Transduction Grammar (BTG) has been well studied and used in statistical machine translation (SMT) with promising results. However, there are two major issues for BTG-based SMT. First, there is no effective...
详细信息
In this paper, we describe a new reranking strategy named word lattice reranking, for the task of joint Chinese word segmentation and part-of-speech (POS) tagging. As a derivation of the forest reranking for parsing (...
详细信息
Traditional GEP algorithm takes up many system resources in decoding and evaluating due to the operation of the tree construction and corresponding traversing. This paper aims to introduce a novel GEP algorithm to all...
详细信息
Traditional GEP algorithm takes up many system resources in decoding and evaluating due to the operation of the tree construction and corresponding traversing. This paper aims to introduce a novel GEP algorithm to alleviate the drawback mentioned above. The main contributions include:(1) presenting a new method for decoding and evaluating chromosome (SGDE), and proposing the corresponding ETs construction schema;(2) proving the relative natures of SGDE-GEP;(3)The experiments showed that the average efficiency of SGDE-GEP can be raised from 18.94% to 23.11% compared with the traditional GEP.
We propose a cascaded linear model for joint Chinese word segmentation and partof- speech tagging. With a character-based perceptron as the core, combined with realvalued features such as language models, the cascaded...
详细信息
Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart. However, c...
详细信息
Translation rule extraction is a fundamental problem in machine translation, especially for linguistically syntax-based systems that need parse trees from either or both sides of the bitext. The current dominant pract...
详细信息
In this paper, we present our solutions for the WikipediaMM task at ImageCLEF 2008. The aim of this task is to investigate effective retrieval approaches in the context of a large-scale and heterogeneous collection of...
详细信息
In this paper, we present our solutions for the WikipediaMM task at ImageCLEF 2008. The aim of this task is to investigate effective retrieval approaches in the context of a large-scale and heterogeneous collection of Wikipedia images that are searched by textual queries (and/or sample images and/or concepts) describing a user's information need. We first experimented with a text-based image retrieval approach with query extension, where the expansion terms are automatically selected from a knowledge base that is (semi-)automatically constructed from Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to effectively enhance the semantics of queries. Encouragingly, the experimental results rank in the first place among all submitted runs. The second approach we experimented with is content-based image retrieval (CBIR), in which we first train 1-vs-all classifiers for all query concepts by using the training images obtained by Yahoo! search, and then treat the retrieval task as visual concept detection in the given Wikipedia image set. By comparison, this approach performs better than other submitted CBIR runs. Finally, we experimented with a cross-media image retrieval approach by combining and re-ranking text-based and content-based retrieval results. Despite the final experimental results were not formally submitted before the deadline, this approach performs remarkably better than the text-based retrieval or CBIR approaches.
暂无评论