We study the problem of constructing a reverse nearest neighbor (RNN) heat map by finding the RNN set of every point in a two-dimensional space. Based on the RNN set of a point, we obtain a quantitative influence (i.e...
详细信息
ISBN:
(纸本)9781509020218
We study the problem of constructing a reverse nearest neighbor (RNN) heat map by finding the RNN set of every point in a two-dimensional space. Based on the RNN set of a point, we obtain a quantitative influence (i.e., heat) for the point. The heat map provides a global view on the influence distribution in the space, and hence supports exploratory analyses in many applications such as marketing and resource management. To construct such a heat map, we first reduce it to a problem called Region Coloring (RC), which divides the space into disjoint regions within which all the points have the same RNN set. We then propose a novel algorithm named CREST that efficiently solves the RC problem by labeling each region with the heat value of its containing points. In CREST, we propose innovative techniques to avoid processing expensive RNN queries and greatly reduce the number of region labeling operations. We perform detailed analyses on the complexity of CREST and lower bounds of the RC problem, and prove that CREST is asymptotically optimal in the worst case. Extensive experiments with both real and synthetic data sets demonstrate that CREST outperforms alternative algorithms by several orders of magnitude.
In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many...
详细信息
In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many user intents as possible. Most existing intent-aware diversification algorithms recognize user intents as subtopics, each of which is usually a word, a phrase, or a piece of description. In this paper, we leverage query facets to understand user intents in diversification, where each facet contains a group of words or phrases that explain an underlying intent of a query. We generate subtopics based on query facets and propose faceted diversification approaches. Experimental results on the public TREC 2009 dataset show that our faceted approaches outperform state-of-the-art diversification models.
The ongoing research and development in the field of Natural Language Processing has lead to a great number of technologies in its context. There have been major benefits when it comes to bringing together the worlds ...
详细信息
Till now, a large variety of researchers have carried out lots of efforts on object-oriented and UML model metrics from different views. They put forward numerous of metrics and carried out some series of theoretical ...
详细信息
Till now, a large variety of researchers have carried out lots of efforts on object-oriented and UML model metrics from different views. They put forward numerous of metrics and carried out some series of theoretical and experimental verifications on understandability, analyzability, maintainability, fault-proneness, change-proneness and reuse. However, there is no formal semantic specification for UML model metrics, which may lead to potential semantic inconsistency and ambiguity. To solve this problem, this paper provided formalization for UML model metrics at the level of UML Meta models. This formalization can not only help people to understand the meaning of UML model metrics, but also can be used in the application domain of UML model metrics in a more rigorous way.
Attribute reduction is an inevitable problem in machine learning and statistical learning. To improve the traditional rough set reduction, statistical rough sets is then proposed by introducing random sampling into th...
详细信息
ISBN:
(纸本)9781509003914
Attribute reduction is an inevitable problem in machine learning and statistical learning. To improve the traditional rough set reduction, statistical rough sets is then proposed by introducing random sampling into the rough approximation. Random sampling is the main contribution of statistical rough sets. As a result, it is necessary to analyze the randomness of statistical rough sets. In this paper, we analyze and demonstrate the influence of the randomness in the process of attribute reduction by a large number of experiments to test the effectiveness and stability of the random sampling.
This paper presents exploratory subgroup analytics on ubiquitous data: We propose subgroup discovery and assessment approaches for obtaining interesting descriptive patterns and provide a novel graphbased analysis app...
详细信息
Online customer review is considered as a significant informative resource which is useful for both potential customer and product manufacturers. As a result, it is one of the most challenging tasks to mine customer r...
详细信息
Online customer review is considered as a significant informative resource which is useful for both potential customer and product manufacturers. As a result, it is one of the most challenging tasks to mine customer reviews automatically and to provide users with opinion summary. Product features and opinion word play the most important roles in the customers' opinions mining. In this paper, we dedicate our work to opinion word mining. We proposed an approach for opinion word identification based on the association rule mining algorithm. The method makes full use of co-occurrence syntactic characteristic between product features and opinion word. Firstly, the product feature is identified by two-stage filtering scheme, and secondly the opinion word is extracted through association rule mining. The final experiment results show that the proposed method could not only obtain the product features related to domain characteristics, but identify the opinion word effectively. Meanwhile, our approach possesses much higher precision and recall than Hu's work.
The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalabilit...
详细信息
The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even to- tally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically parti- tioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these pro- posed approaches have been evaluated by extensive experiments over large RDF data sets.
data-driven decision in big data era is becoming ubiquitous in electronic grid. In particular, daily collected power consumption records enable workload aware device clustering, which is crucial for critical domain ap...
详细信息
With the increasing proliferation of the Mobile Social Networks (MSN) and the Location Based Service (LBS), location privacy has attracted broad attention in recent years. Most researches have been done with the assum...
详细信息
暂无评论