In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpu...
详细信息
ISBN:
(纸本)9781622765034
In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpus from one annotation guideline to another. We then propose two optimization strategies, iterative training and predict-self reestimation, to further improve the accuracy of annotation guideline transformation. Experiments on Chinese word segmentation show that, the iterative training strategy together with predict-self reestimation brings significant improvement over the simple annotation transformation baseline, and leads to classifiers with significantly higher accuracy and several times faster processing than annotation adaptation does. On the Penn Chinese Treebank 5.0, it achieves an F-measure of 98.43%, significantly outperforms previous works although using a single classifier with only local features.
After years of researches, Chinese word segmentation has achieved quite high precisions for formal style text. However, the performance of segmentation is not so satisfying for MicroBlog corpora. In this paper we desc...
详细信息
To explore the association relations among disease, pathogenesis, physician, symptoms and drug, we adapt a variational Apriori algorithm for discovering association rules on a dataset of the Qing Court Medical Records...
详细信息
Most of the previous works for web video topic detection(e.g., graph-based co-clustering method) always encounter the problem of real-time topic detection, since they all suffer from the high computation complexity. T...
详细信息
Most of the previous works for web video topic detection(e.g., graph-based co-clustering method) always encounter the problem of real-time topic detection, since they all suffer from the high computation complexity. Therefore, a fast topic detection is needed to meet users' or administrators' requirement in real-world scenarios. Along this line, we propose a fast and effective topic detection framework, in which video streams are first partitioned into buckets using a time-window function, and then an incremental hierarchical clustering algorithm is developed, finally a video-based fusion strategy is used to integrate information from multiple modalities. Furthermore, a series of novel similarity metrics are defined in the framework. The experimental results on three months' YouTube videos demonstrate the effectiveness and efficiency of the proposed method.
Local learning approaches are especially easy for parallel processing, so they are very important for cloud computing. In 1997, Lotti A. Zadeh proposed the concept of Granular computing (GrC). Zadeh proposed that ther...
详细信息
Local learning approaches are especially easy for parallel processing, so they are very important for cloud computing. In 1997, Lotti A. Zadeh proposed the concept of Granular computing (GrC). Zadeh proposed that there are three basic concepts that underlie human cognition: granulation, organization and causation and a granule being a clump of points (objects) drawn together by indistinguishability, similarity, proximity or functionality. In this paper, we give out a novel local learning approach based on the concept of Granular computing named as "nested local learning NGLL". The experiment shows that the novel NGLL approach is better than the probabilistic latent semantic analysis (PLSA).
With the success of internet, recently more and more companies start to run web-based business. While running e-business sites, many companies have encountered unexpected degeneration of their web server applications ...
详细信息
Multi-task learning has proven to be useful to boost the learning of multiple related but different tasks. Meanwhile, latent semantic models such as LSA and LDA are popular and effective methods to extract discriminat...
详细信息
Multi-task learning has proven to be useful to boost the learning of multiple related but different tasks. Meanwhile, latent semantic models such as LSA and LDA are popular and effective methods to extract discriminative semantic features of high dimensional dyadic data. In this paper, we present a method to combine these two techniques together by introducing a new matrix tri-factorization based formulation for semi-supervised latent semantic learning, which can incorporate labeled information into traditional unsupervised learning of latent semantics. Our inspiration for multi-task semantic feature learning comes from two facts, i.e., 1) multiple tasks generally share a set of common latent semantics, and 2) a semantic usually has a stable indication of categories no matter which task it is from. Thus to make multiple tasks learn from each other we wish to share the associations between categories and those common semantics among tasks. Along this line, we propose a novel joint Nonnegative matrix tri-factorization framework with the aforesaid associations shared among tasks in the form of a semantic-category relation matrix. Our new formulation for multi-task learning can simultaneously learn (1) discriminative semantic features of each task, (2) predictive structure and categories of unlabeled data in each task, (3) common semantics shared among tasks and specific semantics exclusive to each task. We give alternating iterative algorithm to optimize our objective and theoretically show its convergence. Finally extensive experiments on text data along with the comparison with various baselines and three state-of-the-art multi-task learning algorithms demonstrate the effectiveness of our method.
There are a number of leaf recognition methods, but most of them are based on Euclidean space. In this paper, we will introduce a new description of feature for the leaf image recognition, which represents the leaf co...
详细信息
In delay tolerant networks (DTNs), message delivery is operated in an opportunistic way through store-carry and forward relaying, and every DTN node is in anticipation of cooperation for data forwarding from others. U...
详细信息
In this paper, we present a scalable implementation of a topic modeling (Adaptive Link-IPLSA) based method for online event analysis, which summarize the gist of massive amount of changing tweets and enable users to e...
详细信息
暂无评论