In the theory of compressive sensing, the selection of the basis functions directly affects the sparse transformation, observation number and reconstruction accuracy. In this paper, we introduce the structure of three...
详细信息
In this paper, we present a scalab.e implementation of a topic modeling (Adaptive Link-IPLSA) based method for online event analysis, which summarize the gist of massive amount of changing tweets and enable users to e...
详细信息
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithm...
详细信息
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.
Cross-media is the outstanding characteristics of the age of big data with large scale and complicated processing task. This article presents 5 issues and briefly summarizes the research progress of cross-media knowle...
详细信息
Cross-media is the outstanding characteristics of the age of big data with large scale and complicated processing task. This article presents 5 issues and briefly summarizes the research progress of cross-media knowledge discovery. Furthermore, we propose a framework for cross-media semantic understanding which contains discriminative modeling, generative modeling and cognitive modeling. In cognitive modeling, a new model entitled CAM is proposed which is suitable for cross-media semantic understanding. Moreover, a Cross-Media intelligent Retrieval System (CMIRS) will be illustrated. In the final, the research directions and problems encountered are presented.
Web Question Answering (WQA) and Web Service (WS) are parallel fields in intelligent web computing. In network services, they are used widely, and they are rarely combined together. For many intelligent web applicatio...
详细信息
We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrase-based model and the tree-to-string model, to combine the merits of the two models. With th...
ISBN:
(纸本)9781622761715
We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrase-based model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrase-based model and the tree-to-string model on English-Chinese Translation tasks.
We study the visual learning models that could work efficiently with little ground-truth annotation and a mass of noisy unlab.led data for large scale Web image applications, following the subroutine of semi-supervise...
详细信息
We study the visual learning models that could work efficiently with little ground-truth annotation and a mass of noisy unlab.led data for large scale Web image applications, following the subroutine of semi-supervised learning (SSL) that has been deeply investigated in various visual classification tasks. However, most previous SSL approaches are not able to incorporate multiple descriptions for enhancing the model capacity. Furthermore, sample selection on unlab.led data was not advocated in previous studies, which may lead to unpredictable risk brought by real-world noisy data corpse. We propose a learning strategy for solving these two problems. As a core contribution, we propose a scalab.e semi-supervised multiple kernel learning method (S 3 MKL) to deal with the first problem. The aim is to minimize an overall objective function composed of log-likelihood empirical loss, conditional expectation consensus (CEC) on the unlab.led data and group LASSO regularization on model coefficients. We further adapt CEC into a group-wise formulation so as to better deal with the intrinsic visual property of real-world images. We propose a fast block coordinate gradient descent method with several acceleration techniques for model solution. Compared with previous approaches, our model better makes use of large scale unlab.led images with multiple feature representation with lower time complexity. Moreover, to address the issue of reducing the risk of using unlab.led data, we design a multiple kernel hashing scheme to identify the “informative” and “compact” unlab.led training data subset. Comprehensive experiments are conducted and the results show that the proposed learning framework provides promising power for real-world image applications, such as image categorization and personalized Web image re-ranking with very little user interaction.
Concept learning in information systems is actually performed in knowledge granular space on information systems. But no much attention has been paid to study such a knowledge granular space and its structure so far, ...
详细信息
Concept learning in information systems is actually performed in knowledge granular space on information systems. But no much attention has been paid to study such a knowledge granular space and its structure so far, and its structure characteristics are still poorly understood. In this paper, the granular space is firstly topologized and is decomposed into granular worlds. Then it is modeled as a bounded lattice. Finally, by using graph theory, the bounded lattice obtained is expressed as a hass graph, and the mechanism of concept learning in information systems can be visually explained. With related properties of topological space, bounded lattice and graph theory, the "mysterious" granular space can be delved more deeply into. This work can form a basis for designing concept learning algorithm as well as can richen the theory system for granular computing.
Chiang's hierarchical phrase-based (HPB) translation model advances the state-of-the-art in statistical machine translation by expanding conventional phrases to hierarchical phrases - phrases that contain sub-phra...
详细信息
ISBN:
(纸本)9781622765928;1622765923
Chiang's hierarchical phrase-based (HPB) translation model advances the state-of-the-art in statistical machine translation by expanding conventional phrases to hierarchical phrases - phrases that contain sub-phrases. However, the original HPB model is prone to over-generation due to lack of linguistic knowledge: the grammar may suggest more derivations than appropriate, many of which may lead to ungrammatical translations. On the other hand, limitations of glue grammar rules in the original HPB model may actually prevent systems from considering some reasonable derivations. This paper presents a simple but effective translation model, called the Head-Driven HPB (HD-HPB) model, which incorporates head information in translation rules to better capture syntax-driven information in a derivation. In addition, unlike the original glue rules, the HD-HPB model allows improved reordering between any two neighboring non-terminals to explore a larger reordering search space. An extensive set of experiments on Chinese-English translation on four NIST MT test sets, using both a small and a large training set, show that our HD-HPB model consistently and statistically significantly outperforms Chiang's model as well as a source side SAMT-style model.
This paper presents an extension of Chiang's hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven informa...
ISBN:
(纸本)9781622761715
This paper presents an extension of Chiang's hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang's model with average gains of 1.91 points absolute in BLEU.
暂无评论