There are hundreds or thousands of web data sources providing data of relevance to a particular domain on the Web, so how to find a suitable set of sources quickly to integrate from a number of sources is becoming mor...
详细信息
Recommender systems have been accepted as a vital application on the web by offering product advice or information that users might be interested in. Despite its success, similarity-based collaborative filtering suffe...
详细信息
MBR (Minimum Bounding Rectangle) has been widely used to represent multimedia data objects for multimedia indexing techniques. In kNN search, MINDIST and MINMAXDIST was the most popular pruning metrics employed by MBR...
详细信息
In recent years, large amounts of uncertain data are emerged with the widespread employment of the new technologies, such as wireless sensor networks, RFID and privacy protection. According to the features of the unce...
详细信息
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
On the internet, all-round lawyer information is located at separated information sources, which prevent web users from effective information acquisition. In order to build a unified view of separated, heterogeneous, ...
详细信息
The performance of online analytical processing (OLAP) is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage,...
详细信息
The performance of online analytical processing (OLAP) is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage, and join indexes focus on high perfor- mance storage media, efficient storage models, and reduced query processing. While they effectively perform OLAP applications, there is a vital limitation: main- memory database based OLAP (MMOLAP) cannot provide high performance for a large size data set. In this paper, we propose a novel memory dimension table model, in which the primary keys of the dimension table can be directly mapped to dimensional tuple addresses. To achieve higher performance of dimensional tuple access, we optimize our storage model for dimension tables based on OLAP query workload features. We present directly dimensional tuple accessing (DDTA) based join (DDTA- JOIN), a technique to optimize query processing on the memory dimension table by direct dimensional tuple access. We also contribute by proposing an optimization of the predicate tree to shorten predicate operation length by pruning useless predicate processing. Our experimental results show that the DDTA-JOIN algorithm is superior to both simulated row-store main memory query processing and the open-source column-store main memory database MonetDB, thanks to the reduced join cost and simple yet efficient query processing.
Subjective logic provides a means to describe the trust relationship of the realworld. However, existing fusion operations it offers Weal fused opiniotts equally, which makes it impossible to deal with the weighted op...
详细信息
Subjective logic provides a means to describe the trust relationship of the realworld. However, existing fusion operations it offers Weal fused opiniotts equally, which makes it impossible to deal with the weighted opinions effectively. A. Jcsang presents a solution, which combines the discounting operator and the fusion operator to produce the consensus to the problem. In this paper, we prove that this approach is unsuitable to deal with the weighted opinions because it increases the uncertainty of the consensus. To address the problem, we propose two novel fusion operators that are capable of fusing opinions according to the weight of opinion in a fair way, and one of the strengths of them is improving the trust expressiveness of subjective logic. Furthermore, we present the justification on their definitions with the mapping between the evidence space and the opinion space. Comparisons between existing operators and the ones we proposed show the effectiveness of our new fusion operations.
A routing protocol based on position service information is proposed which adopts the cooperation of micro-scale routing within a road limits and macro-scale routing between roads for VANETS. The optimized Dijkstra al...
详细信息
A routing protocol based on position service information is proposed which adopts the cooperation of micro-scale routing within a road limits and macro-scale routing between roads for VANETS. The optimized Dijkstra algorithm is employed to calculate an oriented optimum route for source and destination. The simulation results show that the proposed protocol enhances the throughputs and reduce the end-to-end delay.
The skyline of a set of multi-dimensional points (tuples) consists of those points for which no clearly better point exists in the given set, using component-wise comparison on domains of interest. Skyline queries, i....
详细信息
暂无评论