We describe the basic architecture of JUMP-1, an MPP prototype developed by collaboration between 7 universities. The proposed architecture can exploit high performance of coarse-grained RISC processor performance in ...
详细信息
We describe the basic architecture of JUMP-1, an MPP prototype developed by collaboration between 7 universities. The proposed architecture can exploit high performance of coarse-grained RISC processor performance in connection with flexible fine-grained operation such as distributed shared memory, versatile synchronization and message communications.
In parallel programs where the problem data is dynamically generated, it is very useful to be able to rely on an efficient load balancing algorithm. The token distribution problem (TDP) is a generalization of the stat...
详细信息
In parallel programs where the problem data is dynamically generated, it is very useful to be able to rely on an efficient load balancing algorithm. The token distribution problem (TDP) is a generalization of the static load balancing problem. This paper describes a novel algorithm for solving the TDP for k-ary d-cube topology networks. Compared to other algorithms, our method is more general and does not rely on every processor knowing the exact number of tokens associated to each processor. The correctness of the algorithm is proved and its complexity is informally studied.
Dynamic load balancing schemes are essentially significant for efficiently executing non-uniform problems in highly parallel multicomputer systems, its objective is to minimize the total execution time of single appli...
详细信息
Dynamic load balancing schemes are essentially significant for efficiently executing non-uniform problems in highly parallel multicomputer systems, its objective is to minimize the total execution time of single applications. This paper has proposed a ARID strategy for distributed dynamic load balancing. Its principle and control protocol are described, and the communication overhead, the effect on system stability and the performance efficiency are analyzed. Finally, simulation experiments are carried out to compare the adaptive strategy with the other dynamic load balancing scheme.
Lower bound on the finishing time of optimal schedules is used as an absolute performance measure of static scheduling heuristics. This paper presents an efficient method of computing such a bound, based on estimating...
详细信息
Lower bound on the finishing time of optimal schedules is used as an absolute performance measure of static scheduling heuristics. This paper presents an efficient method of computing such a bound, based on estimating overlaps among the execution ranges of tasks in a given task graph and analyzing the delays of tasks on the critical paths of the graph. The computation performed by this method is shown to be of higher quality than that of other known methods. The future work and directions on this topic are also indicated.
Alpha consistency, a formal definition of the shared memory model of DEC-Alpha based multiprocessors, is defined in the framework of [6]. The definition allows arbitrary out-of- order execution of instructions and is ...
详细信息
Many combinatorial problems can be efficiently solved for partial k-trees (graphs of treewidth bounded by k). The edge-coloring problem is one of the well-known combinatorial problems for which no NC algorithms have b...
详细信息
Many combinatorial problems can be efficiently solved for partial k-trees (graphs of treewidth bounded by k). The edge-coloring problem is one of the well-known combinatorial problems for which no NC algorithms have been obtained for partial k-trees. This paper gives an optimal and first NC parallel algorithm to find an edge-coloring of any given partial k-tree using a minimum number of colors if k and the maximum degree /spl Delta/ are bounded.< >
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. This paper presents an analysis of a texture segmentation application containing ...
详细信息
ISBN:
(纸本)0818656026
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. This paper presents an analysis of a texture segmentation application containing a 96x96 convolution. Sequential execution required several hours on single processors systems with over 99% of the time spent performing the large convolution. 70% to 75% of execution time is attributable to cache misses within the convolution. We implemented the same application on CM-5, iPSC/860 and PVM distributed memory multicomputers, tailoring the parallel algorithms to each machine's architectures. parallelization significantly reduced execution time, taking 49 second on a 512 node CM-5 and 6.5 minutes on a 32 node iPSC/860.
In this paper, an O(n log n) time algorithm for finding all the maximal cliques of an interval graph is proposed. This algorithm can also be implemented in parallel in O(log n) time using O(n2) processors. The maximal...
详细信息
In this paper, an O(n log n) time algorithm for finding all the maximal cliques of an interval graph is proposed. This algorithm can also be implemented in parallel in O(log n) time using O(n2) processors. The maximal cliques of an interval graph contain important structural information. Many problems on interval graphs can be solved after all the maximal cliques are known. It is shown that cut vertices, bridges, and vertex connectivities can all be determined easily after the maximal cliques are known. Finally, the all-pair shortest path problem for interval graphs is solved based on the relationship between maximal cliques. The all-pair shortest path algorithm can also be parallelized in O(log n) time using O(n2) processors.
This paper describes the design and implementation of a comprehensive operating system(COS) for highly parallel machine. This is one of the research themes in the Japanese University Highly parallel Research Project s...
详细信息
This paper describes the design and implementation of a comprehensive operating system(COS) for highly parallel machine. This is one of the research themes in the Japanese University Highly parallel Research Project supported by the Special Grant of the Research Funds of Ministry of Education. The target machine is called JUMP-1 originally developed in this project. Several design policies are discussed, and the user service classes are defined. COS architecture is based on the micro kernel, and it includes both Partition OS and Service OS. The development environment is also discussed briefly.
Trade-offs between the SIMD and MIMD models of architecture for parallelism are presented. Mixed-mode parallelism, where a machine can switch between the SIMD and MIMD modes of parallelism at instruction-level granula...
详细信息
Trade-offs between the SIMD and MIMD models of architecture for parallelism are presented. Mixed-mode parallelism, where a machine can switch between the SIMD and MIMD modes of parallelism at instruction-level granularity with generally negligible overhead, is discussed. Advantages and disadvantages of mixed-mode parallelism, and an example of a mixed-mode parallel algorithm, are given. The relationship of mixed-mode processing to high-performance heterogeneous computing is overviewed. Difficulties involved with evaluating interconnection networks for parallel machines are then considered. There are a myriad of metrics that have been used in the literature. The problems involved with choosing the most appropriate metric or weighted set of metrics, and performing 'fair' comparisons, are explored.
暂无评论