Due to the increasing complexity of current digital data, similarity search has become a fundamental computational task in many applications. Unfortunately, its costs are still high and grow linearly on single server ...
详细信息
Due to the increasing complexity of current digital data, similarity search has become a fundamental computational task in many applications. Unfortunately, its costs are still high and grow linearly on single server structures, which prevents them from efficient application oil large data volumes. In this paper, we shortly describe four recent scalable distributed techniques for similarity search and Study their performance in executing queries on three different datasets. Though all the methods employ parallelism to speed up query execution, different advantages for different objectives have been identified by experiments. The reported results would be helpful for choosing the best implementations for specific applications. They can also be used for designing new and better indexing structures in the future. (c) 2007 Elsevier B.V. All rights reserved.
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel versi...
详细信息
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the 'shared-nothing' architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.
Sensor networks, that consist of potentially several thousands of nodes each with sensing (heat, sound, light, magnetism, etc.) and wireless communication capabilities, provide great opportunities for monitoring spati...
详细信息
ISBN:
(纸本)0769520235
Sensor networks, that consist of potentially several thousands of nodes each with sensing (heat, sound, light, magnetism, etc.) and wireless communication capabilities, provide great opportunities for monitoring spatial information about a region of interest. Although spatial query execution has been studied extensively in the context of database systems (e.g., indexing technologies), these solutions are not directly applicable in the context of sensor networks due to the decentralized nature of the sensor networks and the limited computational power and energy scarcity of individual sensor nodes. In this paper, we present a peer-to-peer indexing structure, namely peer-tree, in order to address the problem of energy- and time-efficient execution of spatial queries (such as nearest-neighbor queries) in sensor networks. Loosely speaking, our peer-tree structure can be interpreted as a peer-to-peer version of the centralized R-tree index structure. Using the peer-tree as a building block, we present a peer-to-peer query processing model where a query can be posed in any node of the network without the need of a central server. For achieving minimal energy consumption and minimal response time, our query processing model ensures that only the relevant nodes for the correct execution of a query are involved in the query execution.
暂无评论