Cloud monitoring and analysis are challenging tasks that have recently been addressed by Complex Event processing (CEP) techniques. CEP systems can process many incoming event streams and execute continuously running ...
详细信息
ISBN:
(纸本)9781479915484
Cloud monitoring and analysis are challenging tasks that have recently been addressed by Complex Event processing (CEP) techniques. CEP systems can process many incoming event streams and execute continuously running queries to analyze the behavior of a Cloud. Based on a Cloud performance monitoring and analysis use case, this paper experimentally evaluates different CEP architectures in terms of precision, recall and other performance indicators. the results of the experimental comparison are used to propose a novel dynamic CEP architecture for Cloud monitoring and analysis. the novel dynamic CEP architecture is designed to dynamically switch between different centralized and distributed CEP architectures depending on the current machine load and network traffic conditions in the observed Cloud environment.
Dynamic memory allocation in massively parallel systems often suffers from drastic performance decreases due to the required global synchronization. this is especially true when many allocation or deallocation request...
详细信息
ISBN:
(纸本)9781450320177
Dynamic memory allocation in massively parallel systems often suffers from drastic performance decreases due to the required global synchronization. this is especially true when many allocation or deallocation requests occur in parallel. We propose a method to alleviate this problem by making use of the SIMD parallelism found in most current massively parallel hardware. More specifically, we propose a hybrid dynamic memory allocator operating at the SIMD parallel warp level. Using additional constraints that can be fulfilled for a large class of practically relevant algorithms and hardware systems, we are able to significantly speed-up the dynamic allocation. We present and evaluate a prototypical implementation for modern CUDA-enabled graphics cards, achieving an overall speedup of up to several orders of magnitude. Copyright 2013 ACM.
Quantum arithmetic circuits have practical applications in various quantum algorithms. In this paper, we address quantum addition on 2-dimensional nearest-neighbor architectures based on the work presented by Choi and...
详细信息
Geographic Information Systems (GIS) has been prominently working for the designed to sculpt the world. Withthe growth and data and increasing sophistication of analysis and processing techniques the traditional sequ...
详细信息
Multi-Swarm PSO (MPSO) is an extension of the PSO algorithm that incorporates multiple, collaborating swarms. Although embarrassingly parallel in appearance, MPSO is memory bound, introducing challenges for GPU-based ...
详细信息
Visual tracking is an important issue of computer vision, TLD is an on-line visual tracking algorithm with good robustness and high accuracy properties. However, the real-time performance of TLD is low for the large s...
详细信息
ISBN:
(纸本)9781849197267
Visual tracking is an important issue of computer vision, TLD is an on-line visual tracking algorithm with good robustness and high accuracy properties. However, the real-time performance of TLD is low for the large size video sequences. In this paper, we study the most time-consuming stages of TLD, and then propose a parallel algorithm based on CUDA. the experimental results show that the speedup of our algorithm reaches up to 2.59 compared to TLD while maintaining the same detection accuracy.
Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. this especially holds in case o...
详细信息
ISBN:
(纸本)9781479901036
Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. this especially holds in case of platforms that contain one or several massively parallel devices like CUDA-capable GPUs due to issues with domain decomposition, use of global memory and similar. In this paper we compare two approaches for implementing general purpose histogramming on GPUs. the first algorithm is based on private copies of bin counters stored in shared memory for each block of threads. the second one uses the thrust library to sort the input elements and then to search for upper bounds according to bin widths. For bothalgorithms we analyze how the speedup over the sequential version depends on the size of input collection, number of bins, and the type and distribution of input elements. We also implement overlapping of data transfers between host CPU and CUDA device with kernel execution. For bothalgorithms we analyze the pros and cons in detail. For example, privatization strategy can be up to 2x faster than sort-search with realistic inputs, but can only support a limited number of bins. On the other hand, sort-search strategy has about 50% higher speedup than privatization when we use characters as input and can support unlimited number of bins. Finally, we perform an exploration to determine the optimal algorithm depending on the characteristics and values of input parameters.
Over the last decades, graphics processing units have developed from special-purpose graphics accelerators to general-purpose massively parallel co-processors. In recent years they gained increased traction in high pe...
详细信息
ISBN:
(纸本)9781479927012
Over the last decades, graphics processing units have developed from special-purpose graphics accelerators to general-purpose massively parallel co-processors. In recent years they gained increased traction in high performance computing, as they provide superior computational performance in terms of runtime and energy consumption for a wide range of problems. In this survey, we review their employment in distributed computing for a broad range of application scenarios. Common characteristics and a classification of the most relevant use cases are described. Furthermore, we discuss possible future developments of the use of general purpose graphics processing units in the area of service-oriented architecture. the aim of this work is to inspire future research in this field and to give guidelines on when and how to incorporate this new hardware technology.
Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems. these simulations suffer a large computational complexity that leads to simulation times of several weeks in order to recr...
详细信息
ISBN:
(纸本)9783642400476
Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems. these simulations suffer a large computational complexity that leads to simulation times of several weeks in order to recreate just a few microseconds of a molecule's motion even on high-performance computing platforms. In recent years, state-of-the-art molecular dynamics algorithms have benefited from the parallel computing capabilities of multicore systems, as well as GPUs used as co-processors. In this paper we present a parallel molecular dynamics algorithm for on-board multi-GPU architectures. We parallelize a state-of-the-art molecular dynamics algorithm at two levels. We employ a spatial partitioning approach to simulate the dynamics of one portion of a molecular system on each GPU, and we take advantage of direct communication between GPUs to transfer data among portions. We also parallelize the simulation algorithm to exploit the multi-processor computing model of GPUs. Most importantly, we present novel parallelalgorithms to update the spatial partitioning and set up transfer data packages on each GPU. We demonstrate the feasibility and scalability of our proposal through a comparative study with NAMD, a well known parallel molecular dynamics implementation.
the proceedings contain 4 papers. the topics discussed include: processing online aggregation on skewed data in MapReduce;SO-1SR: towards a self-optimizing one-copy serializability protocol for data management in the ...
ISBN:
(纸本)9781450324168
the proceedings contain 4 papers. the topics discussed include: processing online aggregation on skewed data in MapReduce;SO-1SR: towards a self-optimizing one-copy serializability protocol for data management in the cloud;analysis of partitioning strategies for graph processing in bulk synchronous parallel models;and a SLA graph model for data services.
暂无评论