As the core of FM-index and compressed suffix array, the Burrows-Wheeler Transform (BWT) plays a key role in indexing genomic sequence data for pattern search. It can run in O (n) bits, typically in total space less t...
详细信息
In this paper results obtained from the parallelisation of existing 3D electromagnetic Finite Element codes within the ESPRIT HPCN project PARTEL are presented. The parallelisation procedure, based on the Bulk Synchro...
详细信息
In this paper results obtained from the parallelisation of existing 3D electromagnetic Finite Element codes within the ESPRIT HPCN project PARTEL are presented. The parallelisation procedure, based on the Bulk Synchronous parallel approach, is outlined and the encouraging results obtained in terms of speed-up on some industrially significant test cases are described and discussed.
We present a parallel bisection mesh refinement algorithm based on ALBERT (Adaptive multi-Level finite element toolbox using Bisection refinement and Error control by Residual Techniques). The goal is to develop a par...
详细信息
We present a parallel bisection mesh refinement algorithm based on ALBERT (Adaptive multi-Level finite element toolbox using Bisection refinement and Error control by Residual Techniques). The goal is to develop a parallel adaptive finite element code suitable for distributed memory parallel computers or PC clusters. An overview on the basic strategy for the parallelization of ALBERT is given. Issues on the parallel mesh refinement are addressed. A modified mesh refinement algorithm, which can be implemented efficiently on distributed memory parallel computers, is proposed and its properties are discussed. Numerical experiments with parallel bisection mesh refinement algorithm are shown.
This paper describes efficient deterministic techniques for breaking symmetry in parallel. These techniques work well on rooted trees and graphs of constant degree or genus. The primary technique allows us to 3-color ...
详细信息
This paper describes efficient deterministic techniques for breaking symmetry in parallel. These techniques work well on rooted trees and graphs of constant degree or genus. The primary technique allows us to 3-color a rooted tree in O(lg∗
We propose a parallel algorithm for dynamic slicing of distributed Java programs in non-Distributed Shared Memory (DSM) systems. Given a distributed Java program, we first construct an intermediate representation in t...
详细信息
The parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallel algorithms. It consists of different models categorized based on the un...
详细信息
ISBN:
(纸本)9781450384414
The parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallel algorithms. It consists of different models categorized based on the underlying memory access mode used, the most powerful of which is the Concurrent Read Concurrent Write (CRCW) model. A PRAM algorithm describes a series of rounds, each of which consists of a collection of operations that can be executed concurrently within the same time step. However, the lack of support for concurrent memory accesses and the prevalence of asynchronous programming models led to the belief that implementing CRCW PRAM algorithms is unattainable and prompted many to avoid this model except for theoretical studies of optimal performance. In this work, we study the arbitrary and common concurrent writes in the CRCW PRAM model and explore implementation challenges on general-purpose systems. Moreover, we examine current practices for implementing common/arbitrary concurrent writes and propose a new efficient lightweight and thread-safe method to implement concurrent writes through leveraging atomic instructions. To demonstrate the efficacy of our method, we developed OpenMP kernels for classical CRCW PRAM algorithms and provide experimental results and comparisons based on run time performance measured over the x86 multicore architecture. Our results show a performance speedup compared to current practices up to 4.5x across all our benchmarks.
A method to schedule and program operations based on manufacture cells and use of parallel processing is presented in this paper. The factory is organized in cells with the aim of decomposing the global problem of sch...
详细信息
A method to schedule and program operations based on manufacture cells and use of parallel processing is presented in this paper. The factory is organized in cells with the aim of decomposing the global problem of scheduling in subproblems of reduced dimension. This decomposition allows a simplification of tasks related to the control and supervision of the factory. Besides, parallel processing enables faster computations and the use of the program in real time.
This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log3n) time with n processors. The algorithms uses a divide and conquer approach and is based...
详细信息
This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log3n) time with n processors. The algorithms uses a divide and conquer approach and is based on the idea of a one-way separator, which has the property that any directed path can be crossed only once.
In this paper a new framework for parallel parsing is proposed. The parsing problem of context free languages are converted into a system of linear recurrence equations. This presents a new approach for parallel parsi...
详细信息
In this paper a new framework for parallel parsing is proposed. The parsing problem of context free languages are converted into a system of linear recurrence equations. This presents a new approach for parallel parsing because one can apply VLSI automatic synthesis procedures developed for numerical computation. Two well-known context-free parsing algorithms, the Coke-Younger-Kasamy (CYK) algorithm and Early algorithm are rewritten as systems of linear recurrence equations. The proposed framework can be used as an automatic generation procedure of a parallel parser similar to the sequential parser generators tools like YACC.
This paper presents a parallel adaptive version of the block-based Gauss-Jordan algorithm used in numerical analysis to invert matrices. This version includes a characterization of the workload of processors and a mec...
详细信息
This paper presents a parallel adaptive version of the block-based Gauss-Jordan algorithm used in numerical analysis to invert matrices. This version includes a characterization of the workload of processors and a mechanism of its adaptive folding/unfolding. The application is implemented and experimented with MARS in dedicated and non-dedicated environments. The results show that an absolute efficiency of 92% is possible on a cluster of DEC/ALPHA processors interconnected by a Gigaswitch network and an absolute efficiency of 67% can be obtained on an Ethernet network of SUN-Sparc4 workstations. Moreover, the adaptability of the algorithm is experimented on a non-dedicated meta-system including both the two parks of machines.
暂无评论