Speculative evaluation, including leniency and futures, is often used to proeuce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of...
详细信息
ISBN:
(纸本)0897917693
Speculative evaluation, including leniency and futures, is often used to proeuce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We tive a provably efficient parallel implementation of a speculative functional language on various machine models. The implementation includes proper parallelization of the necessary queuing operations on suspended threads. Our target machine models are a butterfly network, hypercube, and PRAM. To prove the efficiency of our implementation, we provide a cost model using a profiling semantic and relate the cost model to implementations on the parallel machine models.
MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map ...
详细信息
MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, which suffers serious performance degradation in the presence of multiprogramming, especially when a space/time sharing policy is employed in OS job scheduling. In this paper, we study compile-time and run-time support for MPI by using threads and demonstrate our optimization techniques for executing a large class of MPI programs written in C. The compile-time transformation adopts thread-specific data structures to eliminate the use of global and static variables in C code. The run-time support includes an efficient point-to-point communication protocol based on a novel lock-free queue management scheme. Our experiments on an SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and it has significant performance advantages with up to a 23-fold improvement in a multiprogrammed environment.
Over the past decade, many programming languages and systems for parallel-computing have been developed, including Cilk, Fork/Join Java, Habanero Java, parallel Haskell, parallel ML, and X10. Although these systems ra...
ISBN:
(纸本)9781450349826
Over the past decade, many programming languages and systems for parallel-computing have been developed, including Cilk, Fork/Join Java, Habanero Java, parallel Haskell, parallel ML, and X10. Although these systems raise the level of abstraction at which parallel code are written, performance continues to require the programmer to perform extensive optimizations and tuning, often by taking various architectural details into account. One such key optimization is granularity control, which requires the programmer to determine when and how parallel tasks should be *** this paper, we briefly describe some of the challenges associated with automatic granularity control when trying to achieve portable performance for parallel programs with arbitrary nesting of parallel constructs. We consider a result from the functional-programming community, whose starting point is to consider an "oracle" that can predict the work of parallel codes, and thereby control granularity. We discuss the challenges in implementing such an oracle and proving that it has the desired theoretical properties under the nested-parallelprogramming model.
暂无评论