SimRank is a well-known algorithm for similarity calculation based on object-to-object relationship. However, it suffers from high computation cost. Inthis paper, we find that the convergence behavior of different obj...
详细信息
ISBN:
(纸本)9783642008863
SimRank is a well-known algorithm for similarity calculation based on object-to-object relationship. However, it suffers from high computation cost. Inthis paper, we find that the convergence behavior of different object pairs is different when we use SimRank to compute the similarity of objects. Many similarity scores converge fast, while others need more time before convergence. Based on this observation, we propose an adaptive method called Adaptive-SimRank to speed up similarity calculation. Using this method, we don't need to recalculate those converged pairs' similarity. The experiments conducted on web datasets and synthetic dataset show that our new method can reduce the running time by nearly 35%.
In recent years MapReduce has risen to be the de-facto tool for big data processing. MapReduce is a disruptive innovation. It has changed the landscape of database market, the landscape of technologies, as well as the...
详细信息
Advances in wireless networks and positioning technologies (e.g., CPS) have enabled new data management applications that monitor moving objects. In such new applications, realtime data analysis such as clustering ana...
详细信息
ISBN:
(纸本)9783540717027
Advances in wireless networks and positioning technologies (e.g., CPS) have enabled new data management applications that monitor moving objects. In such new applications, realtime data analysis such as clustering analysis is becoming one of the most important requirements. In this paper, we present the problem of clustering moving objects in spatial networks and propose a unified framework to address this problem. Due to the innate feature of continuously changing positions of moving objects, the clustering results dynamically change. By exploiting the unique features of road networks, our framework first introduces a notion of cluster block (CB) as the underlying clustering unit. We then divide the clustering process into the continuous maintenance of CBs and periodical construction of clusters with different criteria based on CBs. The algorithms for efficiently maintaining and organizing the CBs to construct clusters are proposed. Extensive experimental results show that our clustering framework achieves high efficiency for clustering moving objects in real road networks.
In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the comm...
详细信息
OS-level virtualization incurs smaller start-up and run-time overhead than HAL-based virtualization and thus forms an important building block for developing fault-tolerant and intrusion-tolerant applications. A compl...
详细信息
database-as-a-Service (DAS) is an emerging database management paradigm wherein partition based index is an effective way to querying encrypted data. However, previous research either focuses on one-dimensional partit...
详细信息
ISBN:
(纸本)9781605586502
database-as-a-Service (DAS) is an emerging database management paradigm wherein partition based index is an effective way to querying encrypted data. However, previous research either focuses on one-dimensional partition or ignores multidimensional data distribution characteristic, especially sparsity and locality. In this paper, we propose Cluster based Onion Partition (COP), which is designed to decrease both false positive and dead space at the same time. Basically, COP is composed of two steps. First, it partition covered space level by level, which is like peeling of onion;second, at each level, a clustering algorithm based on local density is proposed to achieve local optimal secure partition. Extensive experiments on real dataset and synthetic dataset show that COP is a secure multidimensional partition with much less efficiency loss than previous top down or bottom up counterparts. Copyright 2009 ACM.
For ontology-based applications, the efficiency of ontology query is vital. Different from existing approaches, the paper improves performance of ontology query by materializing some derived relations. Experimental re...
详细信息
This paper considers the problem of constructing data aggregation trees in wireless sensor networks (WSNs)for a group of sensor nodes to send collected information to a single sink *** data aggregation tree contains t...
详细信息
This paper considers the problem of constructing data aggregation trees in wireless sensor networks (WSNs)for a group of sensor nodes to send collected information to a single sink *** data aggregation tree contains the sink node,all the source nodes,and some other non-source *** goal of constructing such a data aggregation tree is to minimize the number of non-source nodes to be included in the tree so as to save *** prove that the data aggregation tree problem is NP-hard and then propose an approximation algorithm with a performance ratio of four and a greedy *** also give a distributed version of the approximation *** simulations are performed to study the performance of the proposed *** results show that the proposed algorithms can find a tree of a good approximation to the optimal tree and has a high degree of scalability.
This paper presents a reference framework, called BUD, to manage a large shared bank of unstructured data. This paper lists several important issues on managing or maintaining the unstructured data in BUD. BUD stores ...
详细信息
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
暂无评论