As an important resource for machine translation and cross-language information retrieval, collecting large-scale parallel corpus has been paid wide attention. With the development of the Internet, researchers begin t...
详细信息
Image segmentation problem is a fundamental task and process in computer vision and image processing applications. It is well known that the performance of image segmentation is mainly influenced by two factors: the s...
详细信息
Image segmentation problem is a fundamental task and process in computer vision and image processing applications. It is well known that the performance of image segmentation is mainly influenced by two factors: the segmentation approaches and the feature presentation. As for image segmentation methods, clustering algorithm is one of the most popular approaches. However, most current clustering-based segmentation methods exist some problems, such as the number of regions of image have to be given prior, the different initial cluster centers will produce different segmentation results and so on. In this paper, we present a novel image segmentation approach based on DP clustering algorithm. Compared with the current methods, our method has several improved advantages as follows: 1) This algorithm could directly give the cluster number of the image based on the decision graph; 2) The cluster centers could be identified correctly; 3) We could simply achieve the hierarchical segmentation according to the applications requirement. A lot of experiments demonstrate the validity of this novel segmentation algorithm.
DBpedia is a central hub of Linked Open data (LOD). Being based on crowd-sourced contents and heuristic extraction methods, it is not free of errors. In this paper, we study the application of unsupervised numerical o...
详细信息
The capability of building a model that can be understood and interpreted by humans is one of the main selling points of symbolic machine learning algorithms, such as rule or decision tree learners. However, those alg...
详细信息
In this paper, by strict mathematic reasoning, we discover the relation between the similarity relation and lower approximation. Based on this relation, we design a fast algorithm to build a rule based fuzzy rough cla...
详细信息
To deal with the challenge of information overload, in this paper, we propose a financial news recommendation algorithm which help users find the articles that are interesting to read. To settle the ambiguity problem,...
详细信息
To deal with the challenge of information overload, in this paper, we propose a financial news recommendation algorithm which help users find the articles that are interesting to read. To settle the ambiguity problem, a new presented OF-IDF method is employed to represent the unstructured text data in the form of key concepts, synonyms and synsets which are all stored in the domain ontology. For users, the recommendation algorithm build the profiles based on their behaviors to detect the genuine interests and predict current interests automatically and in real time by applying the thinking of relevance feedback. Finally, the experiment conducted on a financial news dataset demonstrates that the proposed algorithm significantly outperforms the performance of a traditional recommender.
An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of n...
详细信息
ISBN:
(纸本)1595930361
An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection without the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on off-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID multimedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative. Copyright 2014 ACM.
The 2014 edition of the Linked data Mining Challenge, conducted in conjunction with Know@LOD 2014, has been the third edition of this challenge. The underlying data came from two domains: public procurement, and resea...
详细信息
The 2014 edition of the Linked data Mining Challenge, conducted in conjunction with Know@LOD 2014, has been the third edition of this challenge. The underlying data came from two domains: public procurement, and researcher collaboration. Like in the previous year, when the challenge was held at the data Mining on Linked data workshop co-located with the European Conference on Machine Learning and Principles and Practice of knowledge Discovery in databases (ECML PKDD 2013), the response to the challenge appeared lower than expected, with only one solution submitted for the predictive task this year. We have tried to track the reasons for the continuously low participation in the challenge via a questionnaire survey, and principles have been distilled that could help organizers of future similar challenges.
Anomaly detection algorithms face several challenges including computational complexity and resiliency to noise in input data. In this paper, we propose a fast and noise-resilient cluster-based anomaly detection metho...
详细信息
Anomaly detection algorithms face several challenges including computational complexity and resiliency to noise in input data. In this paper, we propose a fast and noise-resilient cluster-based anomaly detection method using collective labelling approach. In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples. Second, since grouping and labelling new samples may be time-consuming, we summarize clusters using Gaussian Mixture Model (GMM). Not only does GMM offer faster processing speed; it also facilitates summarizing clusters with arbitrary shape, and consequently, reducing the memory space requirement. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs. We evaluate the proposed method on various datasets by measuring its false alarm rate, detection rate and memory requirement. We also add different levels of noise to the input datasets to demonstrate the performance of the proposed collective anomaly detection method in the presence of noise. The experimental results confirm superior performance of the proposed method compared to individually-based labelling techniques in terms of memory usage, detection rate and false alarm rate.
News articles often reflect an opinion or point of view, with certain topics evoking more diverse opinions than others. For analyzing and better understanding public discourses, identifying such contested topics const...
详细信息
暂无评论