the proceedings contains 32 papers. Topics discussed include algorithms for parallelization, distributed computer systems and networking, software tools and environments, parallel finite and boundary elements, applica...
详细信息
the proceedings contains 32 papers. Topics discussed include algorithms for parallelization, distributed computer systems and networking, software tools and environments, parallel finite and boundary elements, applications in fluid flour and applications in applied science.
We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate...
详细信息
ISBN:
(纸本)9781509057078
We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantage of this new approach.
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat ...
详细信息
ISBN:
(纸本)9783642246494
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat handle large input data sizes. this paper proposes parallel implementations of multithreaded external sort and external hash join algorithms to accelerate IBM DB2, one of leading RDBMSs, using an IBM Power Edge of Network (IBM PowerEN (TM)) Peripheral Component Interconnect Express (PCIe) card as an accelerator. the preliminary results show that the proposed parallel implementation of the algorithms on PowerEN (TM) PCIe card can speed up the DB2 sort and join performance about two times.
the parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallelalgorithms. It consists of different models categorized based on the un...
详细信息
ISBN:
(纸本)9781450384414
the parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallelalgorithms. It consists of different models categorized based on the underlying memory access mode used, the most powerful of which is the Concurrent Read Concurrent Write (CRCW) model. A PRAM algorithm describes a series of rounds, each of which consists of a collection of operations that can be executed concurrently within the same time step. However, the lack of support for concurrent memory accesses and the prevalence of asynchronous programming models led to the belief that implementing CRCW PRAM algorithms is unattainable and prompted many to avoid this model except for theoretical studies of optimal performance. In this work, we study the arbitrary and common concurrent writes in the CRCW PRAM model and explore implementation challenges on general-purpose systems. Moreover, we examine current practices for implementing common/arbitrary concurrent writes and propose a new efficient lightweight and thread-safe method to implement concurrent writes through leveraging atomic instructions. To demonstrate the efficacy of our method, we developed OpenMP kernels for classical CRCW PRAM algorithms and provide experimental results and comparisons based on run time performance measured over the x86 multicore architecture. Our results show a performance speedup compared to current practices up to 4.5x across all our benchmarks.
the appearance of Multicore processors brings high performance computing to the desktop and opens the doors of mainstream computing for parallel computing. this paradigm shift leads the integration of paxallel program...
详细信息
ISBN:
(纸本)9783540695004
the appearance of Multicore processors brings high performance computing to the desktop and opens the doors of mainstream computing for parallel computing. this paradigm shift leads the integration of paxallel programming standards for high-end shard-memory machine architectures into desktop programming environments. In this paper we present a performance study of these new systems. We evaluate the performance of an OpenMP shared-memory programming model that is integrated into Microsoft Visual Studio C++ 2005 and Intel C++ compilers on a multicore processor. We benchmarked using the NAS OpenMP high-level applications benchmarks and the EPCC OpenMP low-level benchmarks. We report the basic timings, scalability, and run-time profiles of each benchmark and analyze the running results.
Most GPU-based graph systems cannot handle large-scale graphs that do not fit in the GPU memory. the ever-increasing graph size demands a scale-up graph system, which can run on a single GPU with optimized memory acce...
详细信息
ISBN:
(纸本)9781509067640
Most GPU-based graph systems cannot handle large-scale graphs that do not fit in the GPU memory. the ever-increasing graph size demands a scale-up graph system, which can run on a single GPU with optimized memory access efficiency and well-controlled data transfer overhead. However, existing systems either incur redundant data transfers or fail to use shared memory. In this paper we present Graphie, a system to efficiently traverse large-scale graphs on a single GPU. Graphie stores the vertex attribute data in the GPU memory and streams edge data asynchronously to the GPU for processing. Graphie's high performance relies on two renaming algorithms. the first algorithm renames the vertices so that the source vertices can be easily loaded to the shared memory to reduce global memory accesses. the second algorithm inserts virtual vertices into the vertex set to rename real vertices, which enables the use of a small boolean array to track active partitions. the boolean array also resides in shared memory and can be updated in constant time. the renaming algorithms do not introduce any extra overhead in the GPU memory or graph storage on disk. Graphie's runtime overlaps data transfer with kernel execution and reuses transferred data in the GPU memory. the evaluation of Graphie on 7 real-world graphs with up to 1.8 billion edges demonstrates substantial speedups over X-Stream, a state-of-the-art edge-centric graph processing framework on the CPU, and GraphReduce, an out-of-memory graph processing systems on GPUs.
It is shown first by Adleman that deoxyribonucleic acid (DNA) strand could be employed towards calculating solution to an instance of the NP-complete Hamiltonian Path Problem (HPP). Lipton also demonstrated that Adlem...
详细信息
ISBN:
(纸本)9783642030949
It is shown first by Adleman that deoxyribonucleic acid (DNA) strand could be employed towards calculating solution to an instance of the NP-complete Hamiltonian Path Problem (HPP). Lipton also demonstrated that Adleman's techniques Could be used to solve the satisfiability (SAT) problem. In this paper, it is demonstrated how the DNA operations presented by Adleman and Lipton can be used to develop the DNA-based algorithm for solving the 0-1 Knapsack Problem.
In several digital signal processingalgorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. parallel computation of such algorithms with reduced number of proce...
详细信息
ISBN:
(纸本)0769522262
In several digital signal processingalgorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, a systematic design methodology for stride permutation networks is derived. these permutations are represented with Boolean matrices, which are decomposed and mapped directly onto register-based networks. the resulting networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Since the proposed methodology is systematic, it can be exploited in automated design generation.
In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to...
详细信息
ISBN:
(纸本)9783540695004
In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to minimize view maintenance and query processing costs. Some existing approaches are applicable only for small problems, which are far from reality. In this paper we introduce a new approach for materialized view selection using parallel Simulated Annealing (PSA) that selects views from an input Multiple View processing Plan (MVPP). With PSA, we are able to perform view selection on MVPPs having hundreds of queries and thousands of views. Also, in our experimental study we show that our method provides a significant improvement in the quality of the obtained set of materialized views over existing heuristic and sequential simulated annealing algorithms.
作者:
Lenke, MLRR-TUM
Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik Technische Universit?t München 80290 München Germany
Typical applications of the so-called Grand Challenges need massively parallel computer system architectures. Tools like parallel debuggers, performance analysers and visualizers help the code designer to develop effi...
详细信息
Typical applications of the so-called Grand Challenges need massively parallel computer system architectures. Tools like parallel debuggers, performance analysers and visualizers help the code designer to develop efficient parallelalgorithms. Such tools merely support the development cycle. But technical and scientific engineers who make use of parallel high-performance computing applications, e.g. numerical simulation algorithms in computational fluid dynamics (CFD), must be supported in their engineering work by another kind of tool. A tool for the application cycle is required because old, conventional suggestions regarding the arrangement for the application cycle rely on strictly sequential procedures. they are due to the heritage of traditional work on former vector computers. that formative influence is still felt in today's arrangements for the application cycle, prevents a more efficient engineering work and, therefore, must be overcome. New tool conceptions have to be introduced to enable on-line interaction between the technical and scientific engineers and their running parallel simulation. VIPER stands for VIsualization of parallel numerical simulation algorithms for Extended Research and offers physical parameters of the mathematical model and parameters of the numerical method as objects of a graphical user tool interface for online observation and online modification. A special client-server-client process architecture implementation enables technical and scientific engineers who are sitting at their graphic workstation to interact withtheir parallel simulation algorithms running on a remote parallel computer system. the VIPER prototype is applied on ParNsflex which is a parallel Navier-Stokes solver for real world aero-dynamic problems. A Paragon XP/S was selected as test parallel computer system. A first evaluation indicates the superiority of the VIPER conception against conventional procedures. Copyright (C) 1996 Published by Elsevier Science L
暂无评论