Today, the advent of networking technologies and computer hardware have enabled more and more inexpensive PCs, various mobile devices, smart phones, PDAs, sensors and cameras to be linked to the Internet with better c...
详细信息
Today, the advent of networking technologies and computer hardware have enabled more and more inexpensive PCs, various mobile devices, smart phones, PDAs, sensors and cameras to be linked to the Internet with better connectivity. In recent years, we have witnessed the emergence of several instances of distributed applications, providing infrastructures for social interactions over large-scale wide-area networks and facilitating the ways users share and publish data. User generated data today range from simple text files to (semi-) structured documents and multimedia content. With the emergence of Semantic Web, the number of features (associated with a content) that are used in order to index those large amounts of heterogenous pieces of data is growing dramatically. The feature sets associated with each content type can grow continuously as we discover new ways of describing a content in formulated *** the number of dimensions in the feature data grow (as high as 100 to 1000), it becomes harder and harder to search for information in a dataset due to the curse of dimensionality and it is not appropriate to use naive search methods, as their performance degrade to linear search. As an alternative, we can distribute the content and the query processing load to a set of peers in a distributed Peer-to-Peer (P2P) network and incorporate high-dimensional distributedsearch techniques to attack the ***, a large percentage of Internet traffic consists of video and music files shared and exchanged over P2P networks. In most present services, searching for music is performed through keyword search and naive string-matching algorithms using collaborative filtering techniques which mostly use tag based approaches. In music information retrieval (MIR) systems, the main goal is to make recommendations similar to the music that the user listens to. In these systems, techniques based on acoustic feature extraction can be employed to achieve content-based music si
The retrieval facilities of most peer-to-peer (P2P) systems are limited to queries based on a unique identifier or a small set of keywords. The techniques used for this purpose are hardly applicable for content based ...
详细信息
The retrieval facilities of most peer-to-peer (P2P) systems are limited to queries based on a unique identifier or a small set of keywords. The techniques used for this purpose are hardly applicable for content based image retrieval (CBIR) in a P2P network. Furthermore, we will argue that the curse of dimensionality and the high communication overhead prevent the adaptation of multidimensional search trees or fast sequential scan techniques for P2P CBIR. In the present paper we will propose two compact data representations that can be distributed in a P2P network and used as the basis for a source selection. This allows for communicating with only a small fraction of all peers during query processing without deteriorating the result quality significantly. We will also present experimental results confirming our approach.
Spatio-temporal trajectory analytics are useful in diversified applications such as urban planning, infrastructure development, and vehicular networks. Trajectory similarity measure, which aims to evaluate the distanc...
详细信息
Spatio-temporal trajectory analytics are useful in diversified applications such as urban planning, infrastructure development, and vehicular networks. Trajectory similarity measure, which aims to evaluate the distance between two trajectories, is a fundamental functionality of trajectory analytics. In this paper, we propose a comprehensive survey that investigates all the most common and representative spatio-temporal trajectory measures. First, we provide an overview of spatio-temporal trajectory measures in terms of three hierarchical perspectives: Non-learning versus Learning, Free Space versus Road Network, and Standalone versus distributed. Next, we present an evaluation benchmark by designing five real-world transformation scenarios. Based on this benchmark, extensive experiments are conducted to study the effectiveness, robustness, efficiency, and scalability of each measure, which offers guidelines for trajectory measure selection among multiple techniques and applications such as trajectory data mining, deep learning, and distributed processing. Specifically, i) Effectiveness: In terms of trajectory length, DFD and Seg-Frechet are length-sensitive, while OWD and Hausdorff always return same results when varying query trajectory length. In terms of trajectory shape, LCRS and LORS are able to effectively find similar trajectories for query trajectories with different shapes;ii) Robustness: Learning based measures are more robust compared with non-learning based ones. Among non-learning based measures, DFD, Hausdorff, OWD and Seg-Frechet are relatively non-sensitive to noises and different sampling rates;and iii) Efficiency& Scalability: Compared to non-learning based measures, learning based and distributed based measures are more efficient and scalable.
In this paper, we consider load balancing and maintenance of distributed similarity search systems using locality sensitive hashing (LSH) in a structured peer-to-peer (P2P) network based on distributed Hashing Table (...
详细信息
ISBN:
(纸本)9789811007408;9789811007385
In this paper, we consider load balancing and maintenance of distributed similarity search systems using locality sensitive hashing (LSH) in a structured peer-to-peer (P2P) network based on distributed Hashing Table (DHT). LSH has been proven efficient in K-Nearest Neighbor (KNN) search in high dimensions. Recently, a number of schemes have been proposed to implement LSH over DHT-based P2P systems to process distributed similarity searches. We provide an efficient structure using virtual nodes to manage the multi-dimensional LSH bucket space in DHT peers and the maintenance algorithm, which improves load balancing in comparison with state-of-the-art techniques such as the virtual node algorithm. Here, we demonstrate effectiveness of the proposed method by experiments.
Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in high dimensional space for approximate nearest neighbor search. As the volume of real-world datasets keeps growing, it has becom...
详细信息
ISBN:
(纸本)9781450350228
Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in high dimensional space for approximate nearest neighbor search. As the volume of real-world datasets keeps growing, it has become necessary to develop distributed LSH solutions. Implementing a distributed LSH algorithm from scratch requires high development costs, thus most existing solutions are developed on general-purpose platforms such as Hadoop and Spark. However, we argue that these platforms are both hard to use for programming LSH algorithms and ineffcient for LSH computation. We propose LoSHa, a distributed computing framework that reduces the development cost by designing a tailor-made, general programming interface and achieves high effciency by exploring LSH-speciffc system implementation and optimizations. We show that many LSH algorithms can be easily expressed in LoSHa's API. We evaluate LoSHa and also compare with general-purpose platforms on the same LSH algorithms. Our results show that LoSHa's performance can be an order of magnitude faster, while the implementations on LoSHa are even more intuitive and require few lines of code.
Constructing effective and efficient indexes for explosive growing multimedia data is a very challenging problem. To solve the problem, Haghani et al. provide a distributed similarity search method in high dimensions ...
详细信息
ISBN:
(纸本)9780769547497
Constructing effective and efficient indexes for explosive growing multimedia data is a very challenging problem. To solve the problem, Haghani et al. provide a distributed similarity search method in high dimensions using Locality Sensitive Hashing. However, their method needs to estimate a global parameter on the whole dataset beforehand. It is impractical for a large-scale dynamical dataset. This paper proposes a novel constructing method of distributed LSH which does not need any priori knowledge about the dataset. Through generating the hash function with consistent output distribution, we get a data independent predicting model in theory which can guarantee a well load balance even if the dataset dynamically changes. Furthermore, we modify the query algorithm of the basic LSH to make the proposed model more practical. The experimental results on two open large-scale high-dimensional datasets show that the proposed method is more robust, scalable and practical than state-of-the-art.
In this paper,we consider load balancing and maintenance of distributed similarity search system using locality sensitive hashing(LSH) in DHT-based structured P2P *** has been proven efficient in K-Nearest Neighbor...
详细信息
ISBN:
(纸本)9781612848334
In this paper,we consider load balancing and maintenance of distributed similarity search system using locality sensitive hashing(LSH) in DHT-based structured P2P *** has been proven efficient in K-Nearest Neighbor(KNN) search in high ***,a number of schemes have been proposed to implement LSH over DHTbased peer-to-peer system to process distributedsimilarity *** provide an efficient structure using virtual nodes to manage the multi-dimensional LSH bucket space in DHT peers and maintenance algorithm,which improves load balancing in comparison with the state-of-the-art *** effectiveness of the proposed method is proved by experiments.
暂无评论