Embedded multiprocessor architectures present different constraints, and therefore challenges to the problems of partitioning and mapping parallel programs. They must typically optimize throughput and/or latency while...
详细信息
Embedded multiprocessor architectures present different constraints, and therefore challenges to the problems of partitioning and mapping parallel programs. They must typically optimize throughput and/or latency while satisfying placement, memory, and processor throughput constraints. This paper describes the algorithms, organization, and application of Genie - a set of tools for the partitioning and mapping of parallel programs for embedded multiprocessor architectures under such constraints. At one end Genie is tightly coupled into a commercial software development environment - Teamwork SA/RT. At the other it presents an interface to simulation and modeling tools. A example is presented of the application of this environment to an existing real-time embedded application - autonomous underwater vehicle (AUV).< >
Lee's (1961) maze-routing algorithm has been a popular method for routing wires in VLSI circuits. It can also be applied to a variety of other problems, such as robot path planning. Although the algorithm is simpl...
详细信息
Lee's (1961) maze-routing algorithm has been a popular method for routing wires in VLSI circuits. It can also be applied to a variety of other problems, such as robot path planning. Although the algorithm is simple and easy to implement, its computation time can be quite high. Therefore, it is a very attractive candidate for implementation on parallel systems. The major issue in parallelizing this algorithm is mapping the grid space of the problem to the processor space. The communication cost and processor utilization can be greatly affected by the mapping strategy used. Won and Sahni (1987) have studied a class of mapping strategies for Lee's algorithm and analyzed their performance. The authors propose two new mapping strategies. First, they modify Won and Sahni's mapping algorithm by using the concept of mirror images to allow higher processor utilization while reducing the number of boundary cells. The new algorithm is shown to be better than the original one in an obstacle-free grid space. Then, they propose a dynamic mapping algorithm. This new mapping algorithm is shown to give an optimal mapping in an obstacle-free grid space. Also, they performed simulation to study the relative performance of these mapping algorithms for grid spaces with obstacles. The results show that the new algorithms are substantially faster than the earlier ones.< >
Relaxed memory consistency models tolerate increased memory access latency in both hardware and software distributed shared memory systems. In recoverable systems, relaxing consistency has the added benefit of reducin...
详细信息
Relaxed memory consistency models tolerate increased memory access latency in both hardware and software distributed shared memory systems. In recoverable systems, relaxing consistency has the added benefit of reducing the number of checkpoints needed to avoid rollback propagation. The authors introduce new checkpointing algorithms that take advantage of relaxed consistency to reduce the performance overhead of checkpointing. They also introduce a scheme based on lazy relaxed consistency that reduces both checkpointing overhead and the overhead of avoiding error propagation in systems with error latency. They use multiprocessor address traces to evaluate the relaxed consistency approach to checkpointing with distributed shared memory.
This paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a s...
详细信息
ISBN:
(纸本)0818626720
This paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes communicating through shared memory. Several different parallelalgorithms for N-body simulation are presented in Proteus, illustrating how Proteus provides a common foundation for expressing the various parallelprogramming models. This common foundation allows prototype parallel programs to be tested and evolved without the use of machine-specific languages. To transform prototypes to implementations on specific architectures, program refinement techniques are utilized. Refinement strategies are illustrated that target broad-spectrum parallel intermediate languages, and their viability is demonstrated by refining an N-body algorithm to data-parallel CVL code.
The article develops a framework for message-passing architectures consisting of a machine model called communicating random access machine (CRAM) and a programming paradigm. The CRAM model serves as a vehicle for the...
详细信息
The article develops a framework for message-passing architectures consisting of a machine model called communicating random access machine (CRAM) and a programming paradigm. The CRAM model serves as a vehicle for the design and analysis of message-passing algorithms. The message-passing paradigm makes the mapping of algorithms that fit this paradigm onto message-passing architectures more natural.< >
This paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a s...
详细信息
This paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes communicating through shared memory. Several different parallelalgorithms for N-body simulation are presented in Proteus, illustrating how Proteus provides a common foundation for expressing the various parallelprogramming models. This common foundation allows prototype parallel programs to be tested and evolved without the use of machine-specific languages. To transform prototypes to implementations on specific architectures, program refinement techniques are utilized. Refinement strategies are illustrated that target broad-spectrum parallel intermediate languages, and their viability is demonstrated by refining an N-body algorithm to data-parallel CVL code.< >
By exploiting the structure of a directed toroidal graph, the authors have developed a parallel solution to find the shortest path. A parallel dynamic programming solution to finding the minimum cost path is presented...
详细信息
By exploiting the structure of a directed toroidal graph, the authors have developed a parallel solution to find the shortest path. A parallel dynamic programming solution to finding the minimum cost path is presented. First the authors map the toroidal graph to a planar graph, whose structure is exploited to form a parallel algorithm suitable for a message-passing parallel architecture. The problem has applications in surface reconstruction, where contours of a surface are represented as graphs. Finding the shortest-path in these graphs corresponds to finding a best-fit surface over the contours. By parallelizing the solution, the authors have obtained a significant speedup to a computationally intensive problem. Since generic message passing is used for interprocessor communication, the proposed algorithm can be implemented in any distributed or parallel environment. In a heterogeneous environment, relative processor speed and memory would have to be considered for load balancing.< >
The split and merge model is a reasonable method for architecture-independent programming of global image processing operations on parallelarchitectures. We consider image connected components from the point of view ...
详细信息
We present several algorithms for sorting efficiently with parallel two-level and multilevel memories. Our main result is an elegant, easy-to-implement, optimal, deterministic algorithm for external sorting with P dis...
详细信息
The split and merge model is a reasonable method for architecture-independent programming of global image processing operations on parallelarchitectures. The authors consider image connected components from the point...
详细信息
The split and merge model is a reasonable method for architecture-independent programming of global image processing operations on parallelarchitectures. The authors consider image connected components from the point of view of this programming model, and develop split and merge algorithms that implement various connected components algorithms that have appeared in the literature. The algorithms are implemented in two architectures independent languages they have developed, namely Apply and Adapt. Performance of the algorithms on the Sun, the Carnegie Mellon Warp, and the Carnegie Mellon Nectar architectures is compared.< >
暂无评论