We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrase-based model and the tree-to-string model, to combine the merits of the two models. With th...
ISBN:
(纸本)9781622761715
We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrase-based model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrase-based model and the tree-to-string model on English-Chinese Translation tasks.
We study the visual learning models that could work efficiently with little ground-truth annotation and a mass of noisy unlabeled data for large scale Web image applications, following the subroutine of semi-supervise...
详细信息
We study the visual learning models that could work efficiently with little ground-truth annotation and a mass of noisy unlabeled data for large scale Web image applications, following the subroutine of semi-supervised learning (SSL) that has been deeply investigated in various visual classification tasks. However, most previous SSL approaches are not able to incorporate multiple descriptions for enhancing the model capacity. Furthermore, sample selection on unlabeled data was not advocated in previous studies, which may lead to unpredictable risk brought by real-world noisy data corpse. We propose a learning strategy for solving these two problems. As a core contribution, we propose a scalable semi-supervised multiple kernel learning method (S 3 MKL) to deal with the first problem. The aim is to minimize an overall objective function composed of log-likelihood empirical loss, conditional expectation consensus (CEC) on the unlabeled data and group LASSO regularization on model coefficients. We further adapt CEC into a group-wise formulation so as to better deal with the intrinsic visual property of real-world images. We propose a fast block coordinate gradient descent method with several acceleration techniques for model solution. Compared with previous approaches, our model better makes use of large scale unlabeled images with multiple feature representation with lower time complexity. Moreover, to address the issue of reducing the risk of using unlabeled data, we design a multiple kernel hashing scheme to identify the “informative” and “compact” unlabeled training data subset. Comprehensive experiments are conducted and the results show that the proposed learning framework provides promising power for real-world image applications, such as image categorization and personalized Web image re-ranking with very little user interaction.
With the success of internet, recently more and more companies start to run web-based business. While running e-business sites, many companies have encountered unexpected degeneration of their web server applications ...
详细信息
In this paper, we present a scalable implementation of a topic modeling (Adaptive Link-IPLSA) based method for online event analysis, which summarize the gist of massive amount of changing tweets and enable users to e...
详细信息
This paper focuses on side information (SI) refinement in Wyner-Ziv video coding and proposes to exploit the intrinsic property of channel coding for improving the joint decoding performance. In this paper, we propose...
详细信息
This paper focuses on side information (SI) refinement in Wyner-Ziv video coding and proposes to exploit the intrinsic property of channel coding for improving the joint decoding performance. In this paper, we propose to use syndrome and information bits from the encoder to help the decoder in refining the SI. We use extrinsic information transfer (EXIT) chart analysis to deduce the mutual information variation in LDPC iterative decoding during the SI refinement process. The objective is to obtain the same decoding quality under lower coding rates. Simulation results demonstrate the effectiveness of the proposed solution.
Accumulating commonsense knowledge (CK) has proven very useful for many natural language processing tasks. So far the most reliable way of acquisition is still relying on knowledge contributors to offer CK. Unfortunat...
详细信息
Crowd sourcing (CS) systems offer a new way for businesses and individuals to leverage on the power of mass collaboration to accomplish complex tasks in a divide-and-conquer manner. In existing CS systems, no facility...
Crowd sourcing (CS) systems offer a new way for businesses and individuals to leverage on the power of mass collaboration to accomplish complex tasks in a divide-and-conquer manner. In existing CS systems, no facility has been provided for analyzing the trustworthiness of workers and providing decision support for allocating tasks to workers, which leads to high dependency of the quality of work on the behavior of workers in CS systems as shown in this paper. To address this problem, trust management mechanisms are urgently needed. Traditional trust management techniques are focused on identifying the most trustworthy service providers (SPs) as accurately as possible. Little thoughts were given to the question of how to utilize these SPs due to two common assumptions: 1) an SP can serve an unlimited number of requests in one time unit, and 2) a service consumer (SC) only needs to select one SP for interaction to complete a task. However, in CS systems, these two assumptions are no longer valid. Thus, existing models cannot be directly used for trust management in CS systems. This paper takes the first step towards a systematic investigation of trust management in CS systems by extending existing trust management models for CS trust management and conducting extensive experiments to study and analyze the performance of various trust management models in crowd sourcing. In this paper, the following key contributions are made. We 1) propose extensions to existing trust management approaches to enable them to operate in CS systems, 2) design a simulation test-bed based on the system characteristics of Amazon's Mechanical Turk (AMT) to make evaluation close to practical CS systems, 3) discuss the effect of incorporating trust management into CS system on the overall social welfare, and 4) identify the challenges and opportunities for future trust management research in CS systems.
Two-dimensional fisher linear discriminant analysis (2DFLD or 2DLDA) has attracted much attention from researchers recently for the advantages over the singularity problem and the computational cost. Recent research o...
详细信息
Some expanded fuzzy rough sets models have been investigated to handle fuzzy databases with uncertain, imprecise and incomplete real-valued information. In this paper, we make further research on fuzzy rough sets mode...
详细信息
Some expanded fuzzy rough sets models have been investigated to handle fuzzy databases with uncertain, imprecise and incomplete real-valued information. In this paper, we make further research on fuzzy rough sets models in fuzzy environment, and we generalize rough fuzzy sets model based on a covering to fuzzy rough sets model based on a fuzzy covering. The lower and upper approximations of fuzzy subsets are defined based on a fuzzy covering, and basic properties are investigated. Then, the axiom definition of the lower approximation operator is given. It is shown that the rough fuzzy sets model based on a covering is a special instance of the fuzzy rough sets model based on a fuzzy covering.
Relational Database Model (RDM) has been proven to be a very useful data-storage technique. As information is stored as data in relational databases, the induction of concepts from data is a pivotal topic in the data ...
详细信息
暂无评论