We describe the design of a compilation system, which translates Fortran programs automatically into explicitly parallel programs for a massively parallelarchitecture. Such a compiler must automatically generate data...
详细信息
This paper considers the current state of software engineering for parallel systems. A review of existing approaches and techniques identifies inadequacies. Recent work on design, verification and automated support is...
详细信息
This paper describes a parallel polygon rendering method on the graphics computer VC-1. The architecture of the VC-1 is a loosely-coupled array of general-purpose processors, each of which is equipped with a local fra...
详细信息
An approach to declarative construction of parallel implementations (dynamical parallelizers) for a general class of sequential imperative programs by means of the algebraic programming system APS is considered. It gi...
详细信息
Inter-module bandwidth is one of the major constraints on the performance of current and future parallel systems. In this paper, we propose and evaluate several high-performance bus-based parallelarchitectures, inclu...
详细信息
Inter-module bandwidth is one of the major constraints on the performance of current and future parallel systems. In this paper, we propose and evaluate several high-performance bus-based parallelarchitectures, including bus-based cyclic networks (BCNs) and quotient cyclic networks (BQCNs), which are particularly efficient in view of their respective inter-module communication patterns. The inter-cluster connection in a BCN is defined on a set of nodes whose addresses are cyclic shifts of one another. The node degree of a basic BCN is 3;while those of BQCNs and enhanced BCNs can vary from a small constant (e.g., 2) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. A variety of algorithms can be performed efficiently on these networks, thus proving the versatility of BCNs and BQCNs.
In this paper, a model is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This workload characterization is derived from parallel instruction centroid and pa...
详细信息
In this paper, a model is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This workload characterization is derived from parallel instruction centroid and parallel workload similarity. The centroid is a simple measure that aggregates average parallelism, instruction mix, and critical path length. When captured with abstracted information about communication requirements, the result is a powerful tool in understanding the requirements of workloads and their potential performance on target machines. The workload similarity is based on measuring the normalized Euclidean distance (ned) between workload centroids. It will be shown that this workload representation method outperforms comparable ones in accuracy, as well as in time and space requirements. Analysis of the NAS parallel Benchmarks and their performance is presented to demonstrate some of the applications, such as performance prediction with good accuracy, and insight provided by this model.
The Hierarchical PRAM (H-PRAM) (T. Heywood, S. Ranka, 1992) model is a dynamically partitionable PRAM, which charges for communication and synchronization, and allows parallelalgorithms to abstractly represent genera...
详细信息
In this paper a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architecture...
详细信息
ISBN:
(纸本)0818678704
In this paper a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architectures and some key, implementation techniques, such as logical addresses and parallel mechanisms, are described in details. Our prototype system and analyzing results suggest that the scheme presented in the paper not only provides an effective approach to high reliable LANs, but also can improve their performance greatly.
Mesh is one of the most widely used interconnection networks for multiprocessor systems. In this paper, we propose an approach to partition a given mesh into m submeshes which can be allocated to m tasks with grid str...
详细信息
Mesh is one of the most widely used interconnection networks for multiprocessor systems. In this paper, we propose an approach to partition a given mesh into m submeshes which can be allocated to m tasks with grid structures. We adapt two-dimensional packing to solve the submesh allocation problem. Due to the intractability of the two-dimensional packing problem, finding an optimal solution is computationally infeasible. We develop an efficient heuristic packing algorithm called TP-heuristic. Allocating a submesh to each task is achieved using the results of packing. We propose two different methods called uniform scaling and non-uniform scaling. Experiments were carried out to test the accuracy of solutions provided by our allocation algorithm.
A parallel algorithm to find k, 2 &le k &le nn-2, spanning trees from a connected, weighted and undirected graph G(V, E, W) in the order of increasing weight is presented. It runs in O(T(n) + k log n) time wit...
详细信息
A parallel algorithm to find k, 2 &le k &le nn-2, spanning trees from a connected, weighted and undirected graph G(V, E, W) in the order of increasing weight is presented. It runs in O(T(n) + k log n) time with O(n2/log n) processors on a CREW PRAM, where n = /V/, m = /E/ and T(n), O(log n) &le T(n) &le O(log2 n), is the time of the fastest parallelalgorithms to find a minimum spanning tree of G on a CREW PRAM with no more than O(n2/log n) processors. Since T(n) = O(log2 n) for the time being, this result shows that to find k minimum spanning trees can be done in the same time bound as to find just one when k &le O(log n) on a CREW PRAM.
暂无评论