With the extensive application of the knowledge base (KB), how to complete it is a hot topic on Semantic Web. However, many problems go with the big data, and the event matching is one of these problems, which is find...
With the extensive application of the knowledge base (KB), how to complete it is a hot topic on Semantic Web. However, many problems go with the big data, and the event matching is one of these problems, which is finding out the entities referring to the same things in the real world and also the key point in the extending process. To enrich the emergency knowledge base (E-SKB) we constructed before, we need to filter out the news from several web pages and find the same news to avoid data redundancy. In this paper, we proposed a hierarchy blocking method to reduce the times of comparisons and narrow down the scope by extracting the news properties as the blocking keys. The method transforms the event matching problem into a clustering problem. Experimental results show that the proposed method is superior to the existing text clustering algorithm with high precision and less comparison times.
Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR...
详细信息
Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and Sim- Rank (SR) have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds.
This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retri...
详细信息
Sentiment analysis in tourism domain has drawn much attention in past few years, which calls for more precise sentiment word embedding method. The article proposes a kernel optimization function for sentiment word emb...
详细信息
Sentiment analysis in tourism domain has drawn much attention in past few years, which calls for more precise sentiment word embedding method. The article proposes a kernel optimization function for sentiment word embedding. And the method aims at integrating the semantic information, statistics information and sentiment information and maintains the similarity between sentiment words in terms of sentiment orientation. The experiment result shows that the optimal sentiment vectors successfully extract the features in terms of sentiment information and the difference between concretization and abstraction of a sentiment words.
Join is one of the most important operations in data analytics systems. Prior works focus mainly on join optimization using GPUs, but little is known about performance impact on the MICs. In order to investigate poten...
详细信息
Attribute reduction is an inevitable problem in machine learning and statistical learning. To improve the traditional rough set reduction, statistical rough sets is then proposed by introducing random sampling into th...
详细信息
Semantic association represents group relationship among objects in linked data. Searching semantic associations is complicated, which involves the search of multiple objects and the search of their group relationship...
详细信息
Recently, listwise ranking-oriented collaborative filtering (CF) algorithms have gained great success in recommender systems. However, the ranked preference list may compromise the privacy of individuals. A notable pa...
详细信息
ISBN:
(纸本)9781509006809
Recently, listwise ranking-oriented collaborative filtering (CF) algorithms have gained great success in recommender systems. However, the ranked preference list may compromise the privacy of individuals. A notable paradigm for offering strong privacy guarantee is differential privacy. In this paper, we propose DPListCF, a differentially private algorithm based on ListCF (a state-of-art listwise CF algorithm). The main idea of DPListCF is to make both of the similarity calculation phase and rank prediction phase of ListCF satisfy differential privacy, by using input perturbation method and output perturbation method in the two phases respectively. Extensive experiments using two real datasets evaluate the performance of DPListCF, and demonstrate that the proposed algorithm outperforms state-of-art approaches.
This paper proposes a novel method for cross-modal retrieval. Different from vector (text)-to-vector (image) framework of the traditional cross-modal methods, we adopt a vector (text)-to-matrix (image) framework. We a...
详细信息
ISBN:
(纸本)9781509060689
This paper proposes a novel method for cross-modal retrieval. Different from vector (text)-to-vector (image) framework of the traditional cross-modal methods, we adopt a vector (text)-to-matrix (image) framework. We assume that compared with vectors, matrices can directly represent images and characterize the structure of feature space. Furthermore, we propose a Metric based on Multi-order spaces (MMs). Multi-order statistic features are used to represent images for enriching the semantic information, and metrics among the multi-spaces are jointly learned to measure the similarity between two different modalities. Specifically, there are three steps for MMs. First, we jointly use the bags of visual features (zero-order), mean (first-order) and covariance (second-order) to characterize each image. Second, considering that covariance matrices and vectors lie on a Riemannian manifold and an Euclidean space respectively, we embed multi-order spaces into their corresponding Hilbert spaces to reduce the heterogeneity among the original spaces. Finally, the similarity between two different modalities can be measured by learning multiple transformations from the different Hilbert spaces to a common subspace. The performance of the proposed method over the state-of-the-art has been demonstrated through the experiments on two public datasets.
暂无评论