Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. Among different database operations, group by/aggregate is an impo...
详细信息
ISBN:
(纸本)9781467376846
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. Among different database operations, group by/aggregate is an important and potentially costly operation. Moreover, sort-based and hash-based algorithms are the most common ways of processing group by/aggregate queries. While sort-based algorithms are used in traditional DataBase Management systems(DBMS), hash based algorithms can be applied for faster query processing in new columnar databases. Besides, Graphical Processing Units(GPU) can be utilized as fast, high bandwidth co-processors to improve the query processing performance of columnar databases. The focus of this article is on the prototype for group by/aggregate operations that we created to exploit GPUs. We show different hash based algorithms to improve the performance of group by/aggregate operations on GPU. One of the parameters that affect the performance of the group by/aggregate algorithm is the number of groups and hashing algorithm. We show that we can get up to 7.6x improvement in kernel performance compared to a multi-core CPU implementation when we use a partitioned multilevel hash algorithm using GPU shared and global memories.
The initialization of distributed heterogeneous simulation systems presents challenges regarding the parallelization of object construction and setup. This paper presents a method for parallel initialization of distri...
详细信息
ISBN:
(纸本)9781479961436
The initialization of distributed heterogeneous simulation systems presents challenges regarding the parallelization of object construction and setup. This paper presents a method for parallel initialization of distributed simulation systems that consists of a two phases setup. Object instantiation and setup are split in Config and Post Bind phases to permit fast creation times allowing distribution of initialization tasks among different nodes and removing the ordering requirement between the initialization of interdependent objects. A framework of references is presented to facilitate the use of remote objects in a MPI environment using proxies to access local and remote variables, served by a reference name server built into the simulation engine.
We consider closed pattern mining from distributed multi-relational databases, especially focusing on its efficient implementation. Given a set of local databases (horizontal partitions), we first compute their sets o...
详细信息
ISBN:
(纸本)9781479959556
We consider closed pattern mining from distributed multi-relational databases, especially focusing on its efficient implementation. Given a set of local databases (horizontal partitions), we first compute their sets of closed patterns (concepts) using a closed pattern mining algorithm tailored to multi-relational data mining (MRDM). We then generate the set of closed patterns in the global database by utilizing the merge (or subposition) operator, studied in the field of Formal Concept Analysis. Since the computational complexity of MRDM increases compared with the conventional itemset mining, we propose some methods for improving the overall computations. We also present some experimental results using a distributed computation environment based on the MapReduce framework, which shows the effectiveness of the proposed methods.
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on ...
详细信息
ISBN:
(纸本)0818675829
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on a DCPC are coarse-grain applications that involve large amounts of file I/O. Current research in parallel file systems for distributedsystems is providing a mechanism for adapting these applications to the DCPC environment. We present the parallel Virtual File System (PVFS), a system that provides disk striping across multiple nodes in a distributedparallel computer and file partitioning among tasks in a parallel program. PVFS is unique among similar systems in that it uses a streams-based approach that represents each file access with a single set of request parameters and decouples the number of network messages from details of the files striping and partitioning. PVFS also provides support for efficient collective file accesses and allows overlapping file partitions. We present results of early performance experiments that show PVFS achieves excellent speedups in accessing moderately sized file segments.
We introduce the all-software, standard C++-based Aurora distributed shared data system. As with related systems, it provides a shared data abstraction on distributed memory hardware. An innovation in Aurora is the us...
详细信息
ISBN:
(纸本)0818677937
We introduce the all-software, standard C++-based Aurora distributed shared data system. As with related systems, it provides a shared data abstraction on distributed memory hardware. An innovation in Aurora is the use of scoped behaviour for per-context data sharing optimizations (i.e., portion of source code, such as a loop or phase). With scoped behaviour a new language scope (e.g., nested braces) can be used to optimize the data sharing behaviour of the selected source code. Different scopes and different shared data can be optimized in different ways. Thus, scoped behaviour provides a novel level of flexibility to incrementally tune the parallel performance of an application.
Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing...
详细信息
ISBN:
(纸本)9781538639146
Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing small- and large-scale distributed storage systems. However, most of these studies focus on metadata based distributed storage systems. On the other hand, emerging consistent hashing based distributed storage systems are considered to allow better scalability and are highly attractive. We identify challenges in achieving elasticity in consistent hashing based distributed storage. These challenges cannot be easily solved by techniques used in current studies. In this paper, we propose an elastic consistent hashing based distributed storage to solve two problems. First, in order to allow a distributed storage to resize quickly, we modify the data placement algorithm using a primary server design and achieve an equal-work data layout. Second, we propose a selective data re-integration technique to reduce the performance impact when resizing a cluster. Our experimental and trace analysis results confirm that our proposed elastic consistent hashing works effectively and allows significantly better elasticity.
Association rule mining is one of the most important techniques in data mining. It extracts significant patterns from transaction databases and generates rules used in many decision support applications. Many organiza...
详细信息
ISBN:
(纸本)9781424418893
Association rule mining is one of the most important techniques in data mining. It extracts significant patterns from transaction databases and generates rules used in many decision support applications. Many organizations such as industrial, commercial, or even scientific sites may produce large amount of transactions and attributes. Mining effective rules from such large volumes of data requires much time and computing resources. In this paper, we propose a parallel FI-growth association rule mining algorithm for rapid extraction of frequent itemsets from large dense databases. We also show that this algorithm can efficiently be parallelized in a cluster computing environment. The preliminary experiments provide quite promising results, with nearly ideal scaling on small clusters and about half of ideal (15 fold speedup) on a thirty-two processor cluster.
Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to...
详细信息
ISBN:
(纸本)9780769546766
Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e. g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we present dOpenCL (distributed OpenCL) - a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL extends the OpenCL standard, such that arbitrary computing devices installed on any node of a distributed system can be used together within a single application. dOpenCL allows moving data and program code to these devices in a transparent, portable manner. Since dOpenCL is designed as a fully-fledged implementation of the OpenCL API, it allows running existing OpenCL applications in a heterogeneous distributed environment without any modifications. We describe in detail the mechanisms that are required to implement OpenCL for distributedsystems, including a device management mechanism for running multiple applications concurrently. Using three application studies, we compare the performance of dOpenCL with MPI+OpenCL and a standard OpenCL implementation.
Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range of computer applications has encouraged recent research efforts on their efficient parallelization. Furthermo...
详细信息
ISBN:
(纸本)9780769549712
Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range of computer applications has encouraged recent research efforts on their efficient parallelization. Furthermore, trends towards high productivity languages in mainstream computing increases the demand for efficient programming support. In this paper we present a new approach on parallel reductions for distributed memory systems that provides both scalability and programmability. Using OmpSs, a task-based parallel programming model, the developer has the ability to express scalable reductions through a single pragma annotation. This pragma annotation is applicable for tasks as well as for work-sharing constructs (with implicit tasking) and instructs the compiler to generate the required runtime calls. The supporting runtime handles data and task distribution, parallel execution and data reduction. Scalability is achieved through a software cache that maximizes local and temporal data reuse and allows overlapped computation and communication. Results confirm scalability for up to 32 12-core cluster nodes.
An increasing number of real-time embedded applications present high computation requirements which need to be realized within strict time constraints. Simultaneously, architectures are becoming more and more heteroge...
详细信息
ISBN:
(纸本)9781479906581
An increasing number of real-time embedded applications present high computation requirements which need to be realized within strict time constraints. Simultaneously, architectures are becoming more and more heterogeneous, programming models are having difficulty in scaling or stepping outside of a particular domain, and programming such solutions requires detailed knowledge of the system and the skills of an experienced programmer. In this context, this paper advocates the transparent integration of a parallel and distributed execution framework, capable of meeting real-time constraints, based on OpenMP programming model, and using MPI as the distribution mechanism. The paper also introduces our modified implementation of GCC compiler, enabled to support such parallel and distributed computations, which is evaluated through a real implementation. This evaluation gives important hints, towards the development of the parallel/distributed fork-join framework for supporting real-time embedded applications.
暂无评论