Geographic Information Systems (GIS) has been prominently working for the designed to sculpt the world. Withthe growth and data and increasing sophistication of analysis and processing techniques the traditional sequ...
详细信息
Multi-Swarm PSO (MPSO) is an extension of the PSO algorithm that incorporates multiple, collaborating swarms. Although embarrassingly parallel in appearance, MPSO is memory bound, introducing challenges for GPU-based ...
详细信息
Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. this especially holds in case o...
详细信息
ISBN:
(纸本)9781479901036
Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. this especially holds in case of platforms that contain one or several massively parallel devices like CUDA-capable GPUs due to issues with domain decomposition, use of global memory and similar. In this paper we compare two approaches for implementing general purpose histogramming on GPUs. the first algorithm is based on private copies of bin counters stored in shared memory for each block of threads. the second one uses the thrust library to sort the input elements and then to search for upper bounds according to bin widths. For bothalgorithms we analyze how the speedup over the sequential version depends on the size of input collection, number of bins, and the type and distribution of input elements. We also implement overlapping of data transfers between host CPU and CUDA device with kernel execution. For bothalgorithms we analyze the pros and cons in detail. For example, privatization strategy can be up to 2x faster than sort-search with realistic inputs, but can only support a limited number of bins. On the other hand, sort-search strategy has about 50% higher speedup than privatization when we use characters as input and can support unlimited number of bins. Finally, we perform an exploration to determine the optimal algorithm depending on the characteristics and values of input parameters.
Visual tracking is an important issue of computer vision, TLD is an on-line visual tracking algorithm with good robustness and high accuracy properties. However, the real-time performance of TLD is low for the large s...
详细信息
ISBN:
(纸本)9781849197267
Visual tracking is an important issue of computer vision, TLD is an on-line visual tracking algorithm with good robustness and high accuracy properties. However, the real-time performance of TLD is low for the large size video sequences. In this paper, we study the most time-consuming stages of TLD, and then propose a parallel algorithm based on CUDA. the experimental results show that the speedup of our algorithm reaches up to 2.59 compared to TLD while maintaining the same detection accuracy.
Over the last decades, graphics processing units have developed from special-purpose graphics accelerators to general-purpose massively parallel co-processors. In recent years they gained increased traction in high pe...
详细信息
ISBN:
(纸本)9781479927012
Over the last decades, graphics processing units have developed from special-purpose graphics accelerators to general-purpose massively parallel co-processors. In recent years they gained increased traction in high performance computing, as they provide superior computational performance in terms of runtime and energy consumption for a wide range of problems. In this survey, we review their employment in distributed computing for a broad range of application scenarios. Common characteristics and a classification of the most relevant use cases are described. Furthermore, we discuss possible future developments of the use of general purpose graphics processing units in the area of service-oriented architecture. the aim of this work is to inspire future research in this field and to give guidelines on when and how to incorporate this new hardware technology.
Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems. these simulations suffer a large computational complexity that leads to simulation times of several weeks in order to recr...
详细信息
ISBN:
(纸本)9783642400476
Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems. these simulations suffer a large computational complexity that leads to simulation times of several weeks in order to recreate just a few microseconds of a molecule's motion even on high-performance computing platforms. In recent years, state-of-the-art molecular dynamics algorithms have benefited from the parallel computing capabilities of multicore systems, as well as GPUs used as co-processors. In this paper we present a parallel molecular dynamics algorithm for on-board multi-GPU architectures. We parallelize a state-of-the-art molecular dynamics algorithm at two levels. We employ a spatial partitioning approach to simulate the dynamics of one portion of a molecular system on each GPU, and we take advantage of direct communication between GPUs to transfer data among portions. We also parallelize the simulation algorithm to exploit the multi-processor computing model of GPUs. Most importantly, we present novel parallelalgorithms to update the spatial partitioning and set up transfer data packages on each GPU. We demonstrate the feasibility and scalability of our proposal through a comparative study with NAMD, a well known parallel molecular dynamics implementation.
the proceedings contain 4 papers. the topics discussed include: processing online aggregation on skewed data in MapReduce;SO-1SR: towards a self-optimizing one-copy serializability protocol for data management in the ...
ISBN:
(纸本)9781450324168
the proceedings contain 4 papers. the topics discussed include: processing online aggregation on skewed data in MapReduce;SO-1SR: towards a self-optimizing one-copy serializability protocol for data management in the cloud;analysis of partitioning strategies for graph processing in bulk synchronous parallel models;and a SLA graph model for data services.
For solving large instances of the Travelling Salesman Problem (TSP), the use of a candidate set (or candidate list) is essential to limit the search space and reduce the overall execution time when using heuristic se...
详细信息
the k nearest neighbor (kNN) computing is an important task in different fields such as LBSN and database area. Recently some methods have been proposed to accelerate kNN searching algorithms for static points with GP...
详细信息
Experiments with particle accelerators are the result of a highly complex interplay between various machines. the associated production chains for a beam, from ion source to target(s), are both a coordination and sche...
详细信息
ISBN:
(纸本)9780946881819
Experiments with particle accelerators are the result of a highly complex interplay between various machines. the associated production chains for a beam, from ion source to target(s), are both a coordination and scheduling problem, with many interdependencies and multiple paths to consider. this ranges from system initialisation and synchronisation of numerous machines to interlock handling and appropriate contingency measures like beam dump scenarios. the Facility for Antiproton and Ion Research (FAIR) is in need of a matching control system, able to handle more than 2000 front-end devices, fully parallel code execution and deterministic command delivery. this paper deals withthe analysis of the underlying requirements of the FAIR site, models of the accelerators and discusses possible architectures for a timing master unit to use these models.
暂无评论