Reverse nearest neighbor (RNN) queries are the complimentary problem and particular interest in the past few years, such as location-based services, profile-based marketing, resource allocation, and traffic monitoring...
详细信息
Reverse nearest neighbor (RNN) queries are the complimentary problem and particular interest in the past few years, such as location-based services, profile-based marketing, resource allocation, and traffic monitoring system. the one major drawback for the existing RNN is that it has inherent sequential nature and uses in-memory algorithm, which limits its applicability to large-scale spatial data queries. this paper proposes scalable algorithms for RNN queries in a distributed environment. Firstly, we investigate the Basic-scalable reverse nearest neighbor (SRNN) initialization query method based on the inverted grid index. Secondly, two optimization methods Lazy-SRNN and Eager-SRNN are proposed to effectively process scalable multi-dimensional RNN queries. Among them, Lazy-SRNN prunes the search space when all RNN objects are discovered in one pass;Eager-SRNN attempts to prune spatial objects incrementally as soon as they are visited. In addition, the SRNN algorithm is proved to be the first attempt for the exact scalable RNN algorithms in a distributed environment on multi-dimensional data sets. We show in an extensive experimental evaluation on real-world and synthetic data the scalability and the performance of our novel approach. Copyright (c) 2015 John Wiley & Sons, Ltd.
In this article we consider parallel numerical algorithms to solve the 3D mathematical model, that describes a wave propagation in rectangular waveguide. the main goal is to formulate and analyze a minimal algorithmic...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In this article we consider parallel numerical algorithms to solve the 3D mathematical model, that describes a wave propagation in rectangular waveguide. the main goal is to formulate and analyze a minimal algorithmic template to solve this problem by using the CUDA platform. this template is based on explicit finite difference schemes obtained after approximation of systems of differential equations on the staggered grid. the parallelization of the discrete algorithm is based on the domain decomposition method. the theoretical complexity model is derived and the scalability of the parallel algorithm is investigated. Results of numerical simulations are presented.
Selection algorithms find the kth smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implem...
详细信息
ISBN:
(纸本)9783642552243
Selection algorithms find the kth smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implement but also inefficient in practice. Consequently, scalable applications can only use few special cases such as minimum and maximum, where efficient implementations exist. To overcome such limitations, we propose a general parallel selection algorithm that scales even on today's largest supercomputers. Our approach is based on an efficient, unbiased median approximation method, recently introduced as median-of-3 reduction, and Hoare's sequential QuickSelect idea from 1961. the resulting algorithm scales with a time complexity of O(log(2) n) for n distributed elements while needing only O(1) space. Furthermore, we prove it to be a practical solution by explaining implementation details and showing performance results for up to 458, 752 processor cores.
the main contribution of this paper is to present an implementation that performs the exhaustive search to verify the Collatz conjecture using a GPU. Consider the following operation on an arbitrary positive number: i...
详细信息
ISBN:
(纸本)9783319111971;9783319111964
the main contribution of this paper is to present an implementation that performs the exhaustive search to verify the Collatz conjecture using a GPU. Consider the following operation on an arbitrary positive number: if the number is even, divide it by two, and if the number is odd, triple it and add one. the Collatz conjecture asserts that, starting from any positive number m, repeated iteration of the operations eventually produces the value 1. We have implemented it on NVIDIA GeForce GTX TITAN and evaluated the performance. the experimental results show that, our GPU implementation can verify 5.01x10(11) 64-bit numbers per second, while the CPU implementation on Intel Xeon X7460 can verify 1.80 x 10(9) 64-bit numbers per second. thus, our implementation on the GPU attains a speed-up factor of 278 over the single CPU implementation.
this paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. the KLA paradigm enables the level of asynchrony in parall...
详细信息
ISBN:
(纸本)9781450328098
this paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. the KLA paradigm enables the level of asynchrony in parallel graph algorithms to be parametrically varied from none (levelsynchronous) to full (asynchronous). the motivation is to improve execution times through an appropriate trade-off between the use of fewer, but more expensive global synchronizations, as in level-synchronous algorithms, and more, but less expensive local synchronizations (and perhaps also redundant work), as in asynchronous algorithms. We show how common patterns in graph algorithms can be expressed in the KLA pardigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the stapl Graph Library show excellent scalability on up to 96K cores and improvements of 10x or more over levelsynchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs.
Evaluation of the randomness quality of a random number generator requires an efficient suite of statistical tests which takes advantage of the processing power of today's multi-core processing power in order to c...
详细信息
ISBN:
(纸本)9781479965694
Evaluation of the randomness quality of a random number generator requires an efficient suite of statistical tests which takes advantage of the processing power of today's multi-core processing power in order to cope withthe large amount of data to be processed. While, in theory, most complex processingalgorithms can be tuned for concurrent execution, the solution will eventually reach a state in which a compromise needs to be made between the overall performance and the configurability and usability of the application. Our solution is based on completely re-designing the TestU01 architecture to include the notion of parallel computing as part of the general requirements, and not as a tool used for increasing performance. Implementation of this design is done using concepts from the object-oriented paradigm, and uses the. NET Task parallel Library. Experimental results show that the parallel OOP based implementation of the TestU01 library not only obtains similar results as the previous parallel version, but in some cases a better speedup is obtained.
In this work we present a new simple but efficient scheme - Subsquares approach - for development of algorithms for enclosing the solution set of overdetermined interval linear systems. We are going to show two algori...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In this work we present a new simple but efficient scheme - Subsquares approach - for development of algorithms for enclosing the solution set of overdetermined interval linear systems. We are going to show two algorithms based on this scheme and discuss their features. We start with a simple algorithm as a motivation, then we continue with an improved algorithm. Bothalgorithms can be easily parallelized. the features of bothalgorithms will be discussed and numerically tested.
Future converged fixed-mobile networks need high-speed radio links in deployment scenarios where fibre is not available or too expensive. In this paper, we present a field-programmable gate array (FPGA)-based real-tim...
详细信息
Future converged fixed-mobile networks need high-speed radio links in deployment scenarios where fibre is not available or too expensive. In this paper, we present a field-programmable gate array (FPGA)-based real-time transmission system using standard 10G Ethernet interfaces. the system comprises two parallel complex-valued data channels in each direction. Standard FPGAs and low-cost multi-channel analogue-to-digital converters (ADCs) and digital-to-analogue converters (DACs) have been used. For enhanced robustness and optimal usage of the power amplifier, π/4-shift differential quaternary phase-shift keying (DQPSK) modulation is used. All digital signal processing routines for synchronization, equalization, forward error correction etc. have been fully implemented and tested. Using a protocol analyzer, error-free bidirectional transmission of Ethernet frames at 5 Gbit/s is verified. Error-vector magnitude (EVM) values below -30 dB indicate that even higher speeds could be realized.
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time due to its high computational cost;hence the working efficiency of the oil industry is affected. the general purpose Graphic processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) developed by NVIDIA have provided a new solution to this problem. In this study, we have proposed a parallel algorithm of the Kirchhoff pre-stack depth migration and an optimization strategy based on the CUDA technology. Our experiments indicate that for large data computations, the accelerated algorithm achieves a speedup of 8 similar to 15 times compared with NVIDIA GPU.
Adaptive indexing is a concept that considers index creation in databases as a by-product of query processing;as opposed to traditional full index creation where the indexing effort is performed up front before answer...
详细信息
ISBN:
(纸本)9781450329712
Adaptive indexing is a concept that considers index creation in databases as a by-product of query processing;as opposed to traditional full index creation where the indexing effort is performed up front before answering any queries. Adaptive indexing has received a considerable amount of attention, and several algorithms have been proposed over the past few years;including a recent experimental study comparing a large number of existing methods. Until now, however, most adaptive indexing algorithms have been designed single- threaded, yet with multi-core systems already well established, the idea of designing parallelalgorithms for adaptive indexing is very natural. In this regard, and to the best of our knowledge, only one parallel algorithm for adaptive indexing has recently appeared in the literature: the parallel version of standard cracking. In this paper we describe three alternative parallelalgorithms for adaptive indexing, including a second variant of a parallel standard cracking algorithm. Additionally, we describe a hybrid parallel sorting algorithm, and a NUMA- Aware method based on sorting. We then thoroughly compare all these algorithms experimentally. parallel sorting algorithms serve as a realistic baseline for multithreaded adaptive indexing techniques. In total we experimentally compare seven parallelalgorithms. the initial set of experiments considered in this paper indicates that our parallelalgorithms significantly improve over previously known ones. Our results also suggest that, although adaptive indexing algorithms are a good design choice in single- threaded environments, the rules change considerably in the parallel case. that is, in future highly-parallel environments, sorting algorithms could be serious alternatives to adaptive indexing. Copyright 2014 ACM.
暂无评论