We propose a structure called dependency forest for statistical machine translation. A dependency forest compactly represents multiple dependency trees. We develop new algorithms for extracting string-to dependency ru...
详细信息
We propose a structure called dependency forest for statistical machine translation. A dependency forest compactly represents multiple dependency trees. We develop new algorithms for extracting string-to dependency rules and training dependency language models. Our forest-based string-to-dependency system obtains significant improvements ranging from 1.36 to 1.46 BLEU points over the tree-based baseline on the NIST 2004/2005/2006 Chinese-English test sets.
As tokenization is usually ambiguous for many natural languages such as Chinese and Korean, tokenization errors might potentially introduce translation mistakes for translation systems that rely on 1-best tokenization...
详细信息
As tokenization is usually ambiguous for many natural languages such as Chinese and Korean, tokenization errors might potentially introduce translation mistakes for translation systems that rely on 1-best tokenizations. While using lattices to offer more alternatives to translation systems have elegantly alleviated this problem, we take a further step to tokenize and translate jointly. Taking a sequence of atomic units that can be combined to form words in different ways as input, our joint decoder produces a tokenization on the source side and a translation on the target side simultaneously. By integrating tokenization and translation features in a discriminative framework, our joint decoder outperforms the baseline translation systems using 1-best tokenizations and lattices significantly on both Chinese- English and Korean-Chinese tasks. Interestingly, as a tokenizer, our joint decoder achieves significant improvements over monolingual Chinese tokenizers.
We propose a novel method to improve the training efficiency and accuracy of boosted classifiers for object detection. The key step of the proposed method is a sample pre-mapping on original space by referring to the ...
详细信息
This paper presents a new method to detect pedestrian in still image using Sigma sets as image region descriptors in the boosting framework. Sigma set encodes second order statistics of an image region implicitly in t...
详细信息
Active contours have been one of the most successful methods for image segmentation during the last two decades, but one of the shortcomings of being unable to converge to concavity is a handicap to its effectiveness....
详细信息
In this paper, we present a new method for the design of an n-bit synchronous binary up counter in quantum-dot cellular automata (QCA). This method is based on the JK flip-flop which almost always produces the simples...
详细信息
Parsing plays an important role in semantic role lab.ling (SRL) because most SRL systems infer semantic relations from I-best parses. Therefore, parsing errors inevitably lead to lab.ling mistakes. To alleviate this p...
详细信息
Traditional 1-best translation pipelines suffer a major drawback: the errors of 1- best outputs, inevitably introduced by each module, will propagate and accumulate along the pipeline. In order to alleviate this probl...
详细信息
Traditional 1-best translation pipelines suffer a major drawback: the errors of 1- best outputs, inevitably introduced by each module, will propagate and accumulate along the pipeline. In order to alleviate this problem, we use compact structures, lattice and forest, in each module instead of 1-best results. We integrate both lattice and forest into a single tree-to-string system, and explore the algorithms of lattice parsing, lattice-forest-based rule extraction and decoding. More importantly, our model takes into account all the probabilities of different steps, such as segmentation, parsing, and translation. The main advantage of our model is that we can make global decision to search for the best segmentation, parse-tree and translation in one step. Medium-scale experiments show an improvement of +0.9 BLEU points over a state-of-the-art forest-based baseline.
One fundamental problem in services computing is how to bridge the gap between business requirements and various heterogeneous IT services. This involves eliciting business requirements and building a solution accordi...
详细信息
Hand segmentation is the basis of many vision-based hand gesture applications in human computer interaction (HCI). This paper proposes a novel method of skin color weighted disparity competition to incorporate the ski...
详细信息
暂无评论