We introduce the HYperC language, a Data parallel extension of C intended for portability over a wide range of architectures. We present the main topics of the language: the explicit parallelism through the data, the ...
详细信息
ISBN:
(纸本)0818656026
We introduce the HYperC language, a Data parallel extension of C intended for portability over a wide range of architectures. We present the main topics of the language: the explicit parallelism through the data, the synchronous semantics and the parallel flow control that allows asynchronous execution, new function qualifiers to emphasize locality properties code and at last new communication technics to allow overlap of communications and computations even for irregular computations. All these features are discussed with respect to portability and code reusability issues.
In our earlier papers, the parallelization and implementations of Gauss-Seidel(G-S) power flow analysis have been investigated on both shared memory (SM) and distributed memory (DM) machines. the desired properties to...
详细信息
ISBN:
(纸本)0818656026
In our earlier papers, the parallelization and implementations of Gauss-Seidel(G-S) power flow analysis have been investigated on both shared memory (SM) and distributed memory (DM) machines. the desired properties to maximize the speedup, such as the minimum communication overhead and the balancing computational load, have been described. In this paper, we investigate a two stage parallelization scheme to achieve the desired properties for the DM type machines. In the first stage, we introduce a new efficient heuristic clustering algorithm which reduces the communication time and balances the computational load. In the second stage, we devise a coloring algorithm which intends to minimize the synchronization overhead and coordinates the information exchange among processors. It is shown that the parallelization scheme effectively increases the speedups and the associated upper bound of G-S algorithm on the nCUBE2 machine.
Solving search problems takes a large amount of computational resources both in terms of execution time and memory usage. this report presents experimental results of parallel Bidirectional Heuristic Search (PBiHS) on...
详细信息
Solving search problems takes a large amount of computational resources both in terms of execution time and memory usage. this report presents experimental results of parallel Bidirectional Heuristic Search (PBiHS) on the 80-processor EM-4 multithreaded data-flow multiprocessor. the PBiHS searches from two directions in parallel while search in each direction is also performed in parallel. Important data structures are distributed to all processors to help reduce the execution time of realistic problem sizes down to a few seconds or less. We implement two search problems, the Eight Puzzle and the Tower of Hanoi, and execute on the target multiprocessor. Execution results demonstrate that the parallel Bidirectional Heuristic Search, (1) can solve the tree depth 20-40 of the Eight-Puzzle and the 3-9 disks of the Tower of Hanoi in an optimal or near optimal number of iterations in less than two seconds, (2) is highly scalable as it gives over 40-fold speedup on 80 processors, and (3) yields on the average 10-fold improvement over unidirectional search for the 8-Puzzle while generating a far less number of nodes.
In a heterogeneous computing environment, computers have to use a suitable transfer syntax to communicate with each other because of the differences in internal data representations. Transfer syntax conversions take o...
详细信息
ISBN:
(纸本)0818656026
In a heterogeneous computing environment, computers have to use a suitable transfer syntax to communicate with each other because of the differences in internal data representations. Transfer syntax conversions take over 90% of the total processing power needed in OSI protocol processing. Application specific architectures in a heterogeneous system may not be efficient in performing the protocol processing system making use of multiple processors in a shared memory architecture to provide the needed processing power. the results indicated that by prefetching the packets into the local memory of the processors, a processingthroughput of 560Mbits/s is possible.
the thOREAU simulation of vehicular traffic on city streets and freeways, developed by the MITRE Corporation, has been adapted to run in parallel on a network of Unix workstations connected by ethernet. Tenfold and la...
详细信息
ISBN:
(纸本)0818656026
the thOREAU simulation of vehicular traffic on city streets and freeways, developed by the MITRE Corporation, has been adapted to run in parallel on a network of Unix workstations connected by ethernet. Tenfold and larger speedups were observed by running as many as 40 parallelthreads on 34 processors. the performance curves shows little sign of leveling off with higher degrees of parallelism, which may mean that further gains can be obtained with more processors.
Previous work on parallel database systems has paid little attention to the interaction of asynchronous disk prefetching and processor parallelism. this paper investigates this issue for scan operations on shared-memo...
详细信息
ISBN:
(纸本)0818656026
Previous work on parallel database systems has paid little attention to the interaction of asynchronous disk prefetching and processor parallelism. this paper investigates this issue for scan operations on shared-memory multi-processors. Two heuristic methods are developed for the allocation of processors and memory to optimize either the speedup or the benefit/cost ratio of database scan operations. the speedup optimization balances the data production rate of the disks and the data consumption rate of the processors, aiming at optimal speedup while ensuring that resources are not allocated unnecessarily.
Workstation farms are primarily used in single-threaded batch jobs, but there is a growing community of users interested in using farms as 'degenerate' MPP systems. the lack of stable system software has stall...
详细信息
ISBN:
(纸本)0818656026
Workstation farms are primarily used in single-threaded batch jobs, but there is a growing community of users interested in using farms as 'degenerate' MPP systems. the lack of stable system software has stalled the growth of the MPP market, and 'parallel' farms may potentially follow the same fate. We believe the High Performance Fortran is the right answer to the needs of the parallel programming community. the paper will discuss workstation farms and focus on the HPF language and compiler.
We describe two simple optimal-work parallel algorithms for sorting a list £ = (X1, x2,....,Xm) of m strings over an arbitrary alphabet Σ, where Σi=1m|Xi | = n. the first algorithm is a deterministic algorithm ...
详细信息
ISBN:
(纸本)0818656026
We describe two simple optimal-work parallel algorithms for sorting a list £ = (X1, x2,....,Xm) of m strings over an arbitrary alphabet Σ, where Σi=1m|Xi | = n. the first algorithm is a deterministic algorithm that runs in O(log log m/log(2)m) time and the second is a randomized algorithm that runs in O(log m) time. Both algorithms use O(m log m + n) operations.
We present an optimal parallel algorithm that runs in O(√n) time on a √n×ROOTn mesh to compute the constrained Delaunay triangulation of a planar straight-line graph G whose vertices lie in an n-element set S. ...
详细信息
ISBN:
(纸本)0818656026
We present an optimal parallel algorithm that runs in O(√n) time on a √n×ROOTn mesh to compute the constrained Delaunay triangulation of a planar straight-line graph G whose vertices lie in an n-element set S. Implications of our result also include an efficient PRAM algorithm for the same problem, a new optimal mesh algorithm to compute a planar Voronoi diagram, as well as a partial solution to the problem of efficient parallel computation of the geodesic Voronoi diagram of a point set inside a simple polygon.
We investigate the practical integration of functional and imperative parallel programming in the context of a popular sequential object-based language. As the basis of our investigation, we develop solutions to the S...
详细信息
ISBN:
(纸本)0818656026
We investigate the practical integration of functional and imperative parallel programming in the context of a popular sequential object-based language. As the basis of our investigation, we develop solutions to the Salishan Problems, a set of problems intended as a standard by which to compare parallel programming notations. the language that we use is CC+PLU, C+PLU extended with single-assignment variables, parallel composition, and atomic functions. We demonstrate how deterministic parallel programs can be written that are identical-except for the addition of a few keywords--to sequential programs that satisfy the same specifications.
暂无评论