We have developed a programming template to implement divide-and-conquer algorithms on MIMD computers. The template is based on the parallel Divide-and-Conquer function of Mou and Hudak. We explore the programmability...
详细信息
We have developed a programming template to implement divide-and-conquer algorithms on MIMD computers. The template is based on the parallel Divide-and-Conquer function of Mou and Hudak. We explore the programmability and performance of this approach by solving some well-known numerical problems on a shared-memory multiprocessor and a multi-computer. A by-product of this work is a new parallel algorithm for solving tridiagonal systems of equations.
Single processor machines are commonly used to implement data structures and algorithms. The availability of parallel machine makes it possible to efficiently and effectively implement data structures and algorithms, ...
详细信息
ISBN:
(纸本)9780897916585
Single processor machines are commonly used to implement data structures and algorithms. The availability of parallel machine makes it possible to efficiently and effectively implement data structures and algorithms, and therefore, improve the performance and reduce the time complexity. Mapping data structures to multiprocessor machines has received limited attention. In this work, we explore techniques for mapping and implementing data structures and algorithms to multi-processor architectures. Specifically, mapping 2-4 finger search trees to the IPSC/2 Hypercube machine. We present a mapping scheme to distribute the nodes of a 2-4 search tree among the Hypercube processors, and we highlight an implementation scheme for two-finger trees. Performance and complexity analysis is also highlighted.
An approach to declarative construction of parallel implementations (dynamical parallelizers) for a general class of sequential imperative programs by means of the algebraic programming system APS is considered. It gi...
详细信息
The research presented here focuses on the general problem of finding tools and methods to compare and evaluate parallelarchitectures in this particular field: the computer vision. As there are several different para...
详细信息
ISBN:
(纸本)0819419230
The research presented here focuses on the general problem of finding tools and methods to compare and evaluate parallelarchitectures in this particular field: the computer vision. As there are several different parallelarchitectures proposed for machine vision, some means of comparison between them are necessary in order to employ the most suitable architecture for a given application. 'Benchmarks' are the most popular tools for machine speed comparison, but do not give any information on the most convenient hardware structures for implementation of a given vision problem. This paper tries to overcome this weakness by proposing a definition of the concept of a tool for the evaluation of parallel architecture (more general than a benchmark), and provides a characterization of the chosen algorithms. Taken into account different ways to process data, it is necessary to consider two different classes of machines: MISD and (MIMD, SPMD, SIMD) offering different programming models, thus leading to two classes of algorithms. Consequently, two algorithms, one for each class are proposed: 1) the extraction of connected components, and 2) a parallel region growing algorithm with data reorganization. The second algorithm tests the capabilities of the architecture to support the following: i) pyramidal data structures (initial region step), ii) a merge procedure between global and global information (adjacent regions to the growing region), and iii) a parallel merge procedure between local and global information (adjacent points to the growing region).
The cost of interprocessor communication has a substantial impact on execution time when implementating parallel algorithms on physical parallel computers. We study these implementation costs, examining the number of ...
详细信息
The cost of interprocessor communication has a substantial impact on execution time when implementating parallel algorithms on physical parallel computers. We study these implementation costs, examining the number of inter-processor messages, the cost of routing these messages on various architectures, and the number of communication phases. We provide an improved direct routing algorithm for realizing h-relations on crossbar networks. We also introduce a round-robin message-delivery algorithm which reduces the number of times a communication link is established between a pair of processors (by delivering all messages of that phase for the pair in order without interruption.) We summarize criteria sufficient for a parallel algorithm to be implemented optimally on several common networks. We also describe a log n-phase optimal parallel list-ranking algorithm.
This paper compares some parallel computation schemes from view of simple usage, and proposes ADEPS as the most highly recommended. As shown, it produces simple programming language ADETRAN and also sophisticated mach...
详细信息
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wel...
详细信息
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages workload migrations within its neighborhood. This paper compares a couple of fairly well-known nearest neighbor algorithms, the dimension exchange and the diffusion methods and their variants in terms of their performances in both one-port and all-port communication architectures. It turns out that the dimension exchange method outperforms the diffusion method in the one-port communication model, and that the strength of the diffusion method is in asynchronous implementations in the all-port communication model. The underlying communication networks considered assume the most popular topologies, the mesh and the torus and their special cases: the hypercube and the k-ary n-cube.
We develop and experiment with a new parallel algorithm to approximate the maximum weight cut in a weighted undirected graph. Our implementation is based on the recent new algorithm of Goemans and Williamson for this ...
详细信息
We develop and experiment with a new parallel algorithm to approximate the maximum weight cut in a weighted undirected graph. Our implementation is based on the recent new algorithm of Goemans and Williamson for this problem. However, our work aims for an efficient, practical formulation of the algorithm with close to optimal parallelization. Our theoretical analysis and an implementation on the Connection Machine CM5 show that our parallelization achieves linear speedup. We test our implementation on several large graphs (more than 9000 vertices), particularly on large instances of the Ising model.
作者:
A. BodeInstitut für Informatik
Lehrstuhl für Rechnertechnik und Rechnerorganisation Technische Universität München Munchen Germany
This article covers research at Technische Universitat Munchen on distributed and parallelarchitectures and applications. First, an overview on the parallel processing research organization is given. The second main ...
详细信息
This article covers research at Technische Universitat Munchen on distributed and parallelarchitectures and applications. First, an overview on the parallel processing research organization is given. The second main topic covers an integrated hierarchical programming environment TOPSYS for parallel and distributed systems developed as part of the research grant.< >
The main features of an integrated system to support the technology of application problem parallelization, development (assembly) of parallel programs, and tuning to available resources of specific multiprocessor sys...
详细信息
暂无评论