With the extensive application of the knowledge base (KB), how to complete it is a hot topic on Semantic Web. However, many problems go with the big data, and the event matching is one of these problems, which is find...
With the extensive application of the knowledge base (KB), how to complete it is a hot topic on Semantic Web. However, many problems go with the big data, and the event matching is one of these problems, which is finding out the entities referring to the same things in the real world and also the key point in the extending process. To enrich the emergency knowledge base (E-SKB) we constructed before, we need to filter out the news from several web pages and find the same news to avoid data redundancy. In this paper, we proposed a hierarchy blocking method to reduce the times of comparisons and narrow down the scope by extracting the news properties as the blocking keys. The method transforms the event matching problem into a clustering problem. Experimental results show that the proposed method is superior to the existing text clustering algorithm with high precision and less comparison times.
On September 5, 2015, the State Council of Chinese Government, China’s cabinet formally announced its Action Framework for Promoting Big data (***, 2015). This is the milestone for China to catch up the global wave o...
详细信息
Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR...
详细信息
Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and Sim- Rank (SR) have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds.
Join is one of the most important operations in data analytics systems. Prior works focus mainly on join optimization using GPUs, but little is known about performance impact on the MICs. In order to investigate poten...
详细信息
Sentiment analysis in tourism domain has drawn much attention in past few years, which calls for more precise sentiment word embedding method. The article proposes a kernel optimization function for sentiment word emb...
详细信息
Sentiment analysis in tourism domain has drawn much attention in past few years, which calls for more precise sentiment word embedding method. The article proposes a kernel optimization function for sentiment word embedding. And the method aims at integrating the semantic information, statistics information and sentiment information and maintains the similarity between sentiment words in terms of sentiment orientation. The experiment result shows that the optimal sentiment vectors successfully extract the features in terms of sentiment information and the difference between concretization and abstraction of a sentiment words.
Semantic association represents group relationship among objects in linked data. Searching semantic associations is complicated, which involves the search of multiple objects and the search of their group relationship...
详细信息
We study the problem of constructing a reverse nearest neighbor (RNN) heat map by finding the RNN set of every point in a two-dimensional space. Based on the RNN set of a point, we obtain a quantitative influence (i.e...
详细信息
ISBN:
(纸本)9781509020218
We study the problem of constructing a reverse nearest neighbor (RNN) heat map by finding the RNN set of every point in a two-dimensional space. Based on the RNN set of a point, we obtain a quantitative influence (i.e., heat) for the point. The heat map provides a global view on the influence distribution in the space, and hence supports exploratory analyses in many applications such as marketing and resource management. To construct such a heat map, we first reduce it to a problem called Region Coloring (RC), which divides the space into disjoint regions within which all the points have the same RNN set. We then propose a novel algorithm named CREST that efficiently solves the RC problem by labeling each region with the heat value of its containing points. In CREST, we propose innovative techniques to avoid processing expensive RNN queries and greatly reduce the number of region labeling operations. We perform detailed analyses on the complexity of CREST and lower bounds of the RC problem, and prove that CREST is asymptotically optimal in the worst case. Extensive experiments with both real and synthetic data sets demonstrate that CREST outperforms alternative algorithms by several orders of magnitude.
Recently, listwise ranking-oriented collaborative filtering (CF) algorithms have gained great success in recommender systems. However, the ranked preference list may compromise the privacy of individuals. A notable pa...
详细信息
ISBN:
(纸本)9781509006809
Recently, listwise ranking-oriented collaborative filtering (CF) algorithms have gained great success in recommender systems. However, the ranked preference list may compromise the privacy of individuals. A notable paradigm for offering strong privacy guarantee is differential privacy. In this paper, we propose DPListCF, a differentially private algorithm based on ListCF (a state-of-art listwise CF algorithm). The main idea of DPListCF is to make both of the similarity calculation phase and rank prediction phase of ListCF satisfy differential privacy, by using input perturbation method and output perturbation method in the two phases respectively. Extensive experiments using two real datasets evaluate the performance of DPListCF, and demonstrate that the proposed algorithm outperforms state-of-art approaches.
Although Social Network Services (SNSs) continuance usage has recently emerged as an important issue in information systems adaption, the research into older adults' continuance intention towards SNS is still very...
详细信息
暂无评论