Recently, several strategies have been studied to recover faulty processing elements on mesh arrays. S. Y. Kung et al. propose 1 1/2 track model for reconfiguration of mesh arrays and the reconfiguration algorithm bas...
详细信息
ISBN:
(纸本)0818674601
Recently, several strategies have been studied to recover faulty processing elements on mesh arrays. S. Y. Kung et al. propose 1 1/2 track model for reconfiguration of mesh arrays and the reconfiguration algorithm based on graph theory by finding compensation paths. However, compensation path scheme does not perform reconfiguration efficiently since the strategy only focuses on faulty processing elements and require global information. In this paper, we propose a new reconfiguration strategy named recursive shift to recover faulty processing elements on mesh arrays. It is confirmed that out strategy obtain much higher array yield than the previous works without global information on identical redundant processing element.
This paper makes an efficient improvement of processor complexity while solving some connectivity problems in processor arrays with reconfigurable bus systems. We first derive two constant time algorithms in the propo...
详细信息
This paper makes an efficient improvement of processor complexity while solving some connectivity problems in processor arrays with reconfigurable bus systems. We first derive two constant time algorithms in the proposed parallel processing system for computing the dominators and the dominator tree of an undirected graph either using a 3-D n×n×n processors or a 2-D n2×n2 processors, where n is the number of vertices of the graph. Then based on the dominator tree algorithm, we also solve many other graph connectivity problems in constant time. They are the articulation points, bridges, biconnected components, and bridge-connected components problems in undirected graphs.
In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted directed acyclic graph (DAG), also called a task graph or macro-dataflow graph, to a set of homogeneous processors, ...
详细信息
In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted directed acyclic graph (DAG), also called a task graph or macro-dataflow graph, to a set of homogeneous processors, with the objective of minimizing the completion time. We analyze 21 such algorithms and classify them into four groups. The first group includes algorithms that schedule the DAG to a bounded number of processors directly. These algorithms are called the bounded number of processors (BNP) scheduling algorithms. The algorithms in the second group schedule the DAG to an unbounded number of clusters and are called the unbounded number of clusters (UNC) scheduling algorithms. The algorithms in the third group schedule the DAG using task duplication and are called the task duplication based (TDB) scheduling algorithms. The algorithms in the fourth group perform allocation and mapping on arbitrary processor network topologies. These algorithms are called the arbitrary processor network (APN) scheduling algorithms. The design philosophies and principles behind these algorithms are discussed, and the performance of all of the algorithms is evaluated and compared against each other on a unified basis by using various scheduling parameters.
The increasing complexity of parallel computing systems has brought about a crisis in parallel performance evaluation and tuning. Tools for performance measurement and visualization become necessary parts of programmi...
详细信息
The increasing complexity of parallel computing systems has brought about a crisis in parallel performance evaluation and tuning. Tools for performance measurement and visualization become necessary parts of programming environments for parallel computers. However, today's performance analysis systems offer little more than basic measurement and analysis facilities for the sources of poor performance, such as load imbalance, communication overhead, and synchronization loss. Our experience in parallelprogramming shows that a system which can provide higher level performance measurement and analysis is more helpful in the performance tuning of parallel program. For example, whether the programmer adopts a proper program strategy or algorithm is one of the most important factors which affect the performance of parallel programs. Therefore, we argue that a helpful performance tuning tool should be able to assist programmers to optimise the strategy or algorithm in their parallel programs. In this paper we introduce an intelligent performance tuning tool which detects and analyses the strategy and algorithm concepts in parallel programs, helps users rapidly identify the location and cause of the performance problems, and provides suggestions to improve the performance of their parallel programs.
The distance transform (DT) and the medial axis transform (MAT) are two important image operations. They are both used to extract of the information about the shape and the position of the foreground pixels relative t...
详细信息
The distance transform (DT) and the medial axis transform (MAT) are two important image operations. They are both used to extract of the information about the shape and the position of the foreground pixels relative to each other. Many applications of these transforms are applied in the fields of image processing and computer vision, such as expanding shrinking, thinning and computing shape factor, etc. Each of these two transforms is essentially a global operation. Unless the digital image is very small, all global operations are prohibitively costly. In order to provide the efficient transform computations, it is considerably desired to develop parallel algorithms for these two operations. In this paper, we provide the fastest parallel algorithms to compute the chessboard distance transform (CDT) which is a DT based on the chessboard metrics, and the medial axis transform (MAT). Each of the transforms of a 2-D binary image array of size N×N can be computed in O(1) time on the 2-D 2N×2N RAP.
In this paper, we present a new language called ParCeL-1, dedicated to connectionist and explicitely parallel AI programming. ParCeL-1 is a language based on agents, not unlike actor languages. Its agents are autonomo...
详细信息
ISBN:
(纸本)0780333675
In this paper, we present a new language called ParCeL-1, dedicated to connectionist and explicitely parallel AI programming. ParCeL-1 is a language based on agents, not unlike actor languages. Its agents are autonomous and follow a computational model in which the communications are non-blocking and the communication scheme is explicit. ParCeL-1 has a parallel implementation and runs on several multi-processor architectures. We give an example of connectionist programming (the Kohonen map) and show several performance results on a Transputer based multi-processor architecture and on the Gray T3D parallel computer.
In this paper, a parallel architecture is proposed to support the operations described in the ITU-T Recommendation I.432 (B-ISDN user-network interface - Physical layer specification). It is rather difficult to perfor...
详细信息
In this paper, a parallel architecture is proposed to support the operations described in the ITU-T Recommendation I.432 (B-ISDN user-network interface - Physical layer specification). It is rather difficult to perform these operations on a bit serial architecture at a high rate. This paper demonstrates how these tasks can be achieved by means of parallelism. First, we describe the user-network interfaces in general and their physical layer properties. Then a parallel architecture is proposed with a general translation method which converts the serial operation into the parallel one. The application of the parallel architecture on each function is also depicted and the system has been realized in hardware using CMOS technology.
In this paper we present our experience in implementing several irregular problems using a high-level actor language. The problems studied require dynamic computation of object placement and may result in load imbalan...
详细信息
ISBN:
(纸本)0818672552
In this paper we present our experience in implementing several irregular problems using a high-level actor language. The problems studied require dynamic computation of object placement and may result in load imbalance as the computation proceeds, thereby requiring dynamic load balancing. The algorithms are expressed as fine-grained computations providing maximal flexibility in adapting the computation load to arbitrary parallelarchitectures. Such an algorithm may be composed with different partitioning and distribution strategies (PDS's) to result in different performance characteristics. The PDS's are implemented for specific data structures or algorithms and are reusable for different parallel algorithms. We demonstrate how our methodology provides portability of algorithm specification, reusability and ease of expressibility.
This paper presents the new scheme of interconnecting levels in a bitonic sorting network with simpler inter-level wiring. A parity technique which leads to the algorithm Construct_BSNF is introduced. Wiring simplific...
详细信息
This paper presents the new scheme of interconnecting levels in a bitonic sorting network with simpler inter-level wiring. A parity technique which leads to the algorithm Construct_BSNF is introduced. Wiring simplification through the network is achieved by wiring the N/2 even-parity keys straight through the network. N/2 odd-parity keys use flip interconnections. As a result, our interconnection scheme simplifies the inter-level wiring through the network and outperforms the perfect-shuffle interconnection scheme both in terms of cost and delay.
This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to this class of programs is a stream of data sets, each of whic...
详细信息
ISBN:
(纸本)9780897918091
This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to this class of programs is a stream of data sets, each of which is processed in order by the chain of tasks. This computation structure, also referred to as a data parallel pipeline, is common in several application domains including digital signal processing, image processing, and computer vision. The parameters of the performance of stream processing are latency (the time to process an individual data set) and throughput (the aggregate rate at which the data sets are processed). These two criterion are distinct since multiple data sets can be pipelined or processed in parallel. We present a new algorithm to determine a processor mapping of a chain of tasks that optimizes the latency in the presence of throughput constraints, and discuss optimization of the throughput with latency constraints. The problem formulation uses a general and realistic model of inter-task communication, and addresses the entire problem of mapping, which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The main algorithms are based on dynamic programming and their execution time complexity is polynomial in the number of processors and tasks. The entire framework is implemented as an automatic mapping tool in the Fx parallelizing compiler for a dialect of High Performance Fortran.
暂无评论