Arrays of processors with pipelined optical buses are introduced for the efficient implementation of computationally intensive applications. Techniques for the concurrent transmission of messages over the optical bus ...
详细信息
Arrays of processors with pipelined optical buses are introduced for the efficient implementation of computationally intensive applications. Techniques for the concurrent transmission of messages over the optical bus to avoid collision of messages is shown. Convenient parallel data movement operations are derived for this architecture, which are then used in the design of parallel algorithms for the solution of some important numerical problems. The parallel algorithms implemented in the paper are for solving systems of linear equations and finding the roots of nonlinear equations. Even though this array of processors can function in the MIMD mode of operation, it is more suitable for the SIMD mode of operation, because it can be easily synchronised and scaled to a massive number of processors. Hence, the above parallel algorithms have been designed with the SIMD mode in mind. Their time complexities have been analysed, and are shown to compare favourably with those implemented on processors connected with electronic buses or point-to-point links such as the hypercube. Moreover, whereas a processing element of a hypercube of size N has log N ports, a processing element of an array with optical buses has a constant number of ports. Thus, it seems that an array of processors with optical buses is a promising, and could be a better, alternative for future supercomputing systems.
A variety of parallel algorithms running under a contention scheduler and a master/slave scheduler and implemented on the CYBA-M multiprocessor are described and a new predistributed quick sort is reported. Results sh...
详细信息
A variety of parallel algorithms running under a contention scheduler and a master/slave scheduler and implemented on the CYBA-M multiprocessor are described and a new predistributed quick sort is reported. Results show speed up factors of 7.88 for 13 processors, and processor utilisations greater than 75% for 10 or more processors are predicted when sorting large lists with long keys.
The paper presents an approach to performance analysis of heterogeneous parallel algorithms. As a typical heterogeneous parallel algorithm is just a modification of some homogeneous one, the idea is to compare the het...
详细信息
The paper presents an approach to performance analysis of heterogeneous parallel algorithms. As a typical heterogeneous parallel algorithm is just a modification of some homogeneous one, the idea is to compare the heterogeneous algorithm with its homogeneous prototype, and to assess the heterogeneous modification rather than analyse the algorithm as an isolated entity. A criterion of optimality of heterogeneous parallel algorithms is suggested. A parallel algorithm of matrix multiplication on heterogeneous clusters is used to illustrate the proposed approach. (C) 2004 Elsevier B.V. All rights reserved.
In this paper, we try to speed up geometric constraint solving with parallel techniques. We propose parallel algorithms for building rule-bases, judging the under(over)-constrained problems, and finding construction s...
详细信息
ISBN:
(纸本)1595934804
In this paper, we try to speed up geometric constraint solving with parallel techniques. We propose parallel algorithms for building rule-bases, judging the under(over)-constrained problems, and finding construction sequences of geometric constraint problems. Experiment results show that the parallel algorithm can improve the efficiency of geometric constraint solving. Copyright 2007 ACM.
Efficient parallel algorithms for several problems on proper circular are graphs are presented in this paper. These problems include finding a maximum matching, partitioning into a minimum number of induced subgraphs ...
详细信息
Efficient parallel algorithms for several problems on proper circular are graphs are presented in this paper. These problems include finding a maximum matching, partitioning into a minimum number of induced subgraphs each of which has a Hamiltonian cycle (path), partitioning into induced subgraphs each of which has a Hamiltonian cycle (path) with at least k vertices for a given k, and adding a minimum number of edges to make the graph contain a Hamiltonian cycle (path). It is shown here that the above problems can all be solved in logarithmic time with a linear number of EREW PRAM processors, or in constant time with a linear number of BSR processors. A more important part of this work is perhaps the extension of basic BSR to allow simultaneous multiple BROADCAST instructions.
A cross-bridge reconfigurable array of processors is a parallel processing system which has the ability to change dynamically the supported interconnection scheme during the execution of an algorithm. Based on this ar...
详细信息
A cross-bridge reconfigurable array of processors is a parallel processing system which has the ability to change dynamically the supported interconnection scheme during the execution of an algorithm. Based on this architecture, several O(1) time basic operations such as the transpose, the untranspose, the shift, the unshift and the prefix sum of a binary sequence are first proposed. Then, these basic operations can be used to find the kth smallest element of N m bits unsigned integers in O(m) time using N processors acid to sort N data items in O(1) time using O(N5/3) processors instead of using O(N-2) processors as those proposed by other researchers [2], [4], [8], [12], [17], respectively.
The study of many-particle systems has increased significantly over the past decade, because of the increasing number of useful applications it supports. Numerical experiences have shown that the force calculation con...
详细信息
The study of many-particle systems has increased significantly over the past decade, because of the increasing number of useful applications it supports. Numerical experiences have shown that the force calculation contributes 90% of the total simulation time. This is an O(N-2) algorithm, mainly due to pairwise interactions, where N is the number of particles in the system. The interaction decomposition technique proposed by Taylor et al., uses a special mapping scheme and optimal communication to reduce the overall computation time. In this paper, we propose two algorithms based on the force decomposition approach. The first technique which we call Force-Row Interleaving (FRI) method, treats rows one at a time and the other approach, called Force-Stripped Row (FSR), computes a priori the block of rows that balances workload to be sent to a processor. These two algorithms were tested on a system of 32000 atoms of liquid argon and implemented on a distributed memory, 16-processor iPSC/860. The FRI and FSR were both comparable to existing parallel techniques with efficiencies of 98.63% and 98.88%, respectively. (C) 1999 Elsevier Science B.V. All rights reserved.
The authors provide optimal parallel solutions to several fundamental link distance problems set in trapezoided rectilinear polygons. All parallel algorithms are deterministic, run in logarithmic time, have an optimal...
详细信息
Blockwise access. to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we prese...
详细信息
Blockwise access. to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchronous parallel (BSP) algorithms into efficient parallel EM algorithms. It optimizes blockwise data access and parallel disk I/O and, at the same time, utilizes multiple processors connected via a communication network or shared memory. We obtain new improved parallel EM algorithms for a large number of problems including sorting, permutation, matrix transpose, several geometric and GIS problems including three-dimensional convex hulls (two-dimensional Voronoi diagrams), and various graph problems. We show that certain parallel algorithms known for the BSP model can be used to obtain EM algorithms that meet well known I/O complexity lower bounds for various problems, including sorting.
Suppose that G = (S, T, E) is a bipartite graph. An ordering of S(T) has the adjacency property if for each vertex in T(S), its adjacent vertices in S(T) are consecutive in the ordering. If there exist orderings of S ...
详细信息
Suppose that G = (S, T, E) is a bipartite graph. An ordering of S(T) has the adjacency property if for each vertex in T(S), its adjacent vertices in S(T) are consecutive in the ordering. If there exist orderings of S and T which have the adjacency property, G is called a doubly convex-bipartite graph. In this paper, a parallel algorithm is proposed to recognize a doubly convex-bipartite graph. The algorithm runs in O(log n) time using O(n(3)/log n) processors on the CRCW PRAM, or O(log(2) n) time using O(n(3)/log(2) n) processors on the CREW PRAM.
暂无评论