The increasing complexity of parallel computing systems has brought about in crisis in parallel performance evaluation and tuning. Tools for performance measurement and visualization become necessary parts of programm...
详细信息
ISBN:
(纸本)0780335295
The increasing complexity of parallel computing systems has brought about in crisis in parallel performance evaluation and tuning. Tools for performance measurement and visualization become necessary parts of programming environments for parallel computers. However, today's performance analysis systems offer little more than basic measurement and analysis facilities for the sources of poor performance, such as load imbalance, communication overhead, and synchronization loss. Our experience in parallelprogramming shows that a system which can provide higher level performance measurement and analysis is more helpful in the performance tuning of parallel program. For example, whether the programmer adopts a proper program strategy or algorithm is one of the most important factors which affect the performance of parallel programs. Therefore, we argue that a helpful performance tuning tool should be able to assist programmers to optimise the strategy or algorithm in their parallel programs. In this paper we introduce an intelligent performance tuning tool which detects and analyses the strategy and algorithm concepts in parallel programs, helps users rapidly identify the location and cause of the performance problems, and provides suggestions to improve the performance of their parallel programs.
The rise of explicit parallelprogramming involves new problems: lack of structure for parallel algorithms and the ad hoc development of parallel algorithms. We use skeletons to characterize and design parallel algori...
详细信息
ISBN:
(纸本)0818675829
The rise of explicit parallelprogramming involves new problems: lack of structure for parallel algorithms and the ad hoc development of parallel algorithms. We use skeletons to characterize and design parallel algorithms and define a process to refine the designs step by step into programs. The paper introduces a high level library on top of MPI which is derived from the skeleton concept to achieve better programmability and obtain portability. We conclude with a CFD application to demonstrate our idea.
This paper presents the new scheme of interconnecting levels in a bitonic sorting network with simpler inter-level wiring. A parity technique which leads to the algorithm Construct-BSMF is introduced. Wiring simplific...
详细信息
This paper presents the new scheme of interconnecting levels in a bitonic sorting network with simpler inter-level wiring. A parity technique which leads to the algorithm Construct-BSMF is introduced. Wiring simplification through the network is achieved wiring the N/2 even-parity keys straight through the network. N/2 odd-parity keys use flip interconnections. As a result, our interconnection scheme simplifies the inter-level wiring through the network and outperforms the perfect-shuffle interconnection scheme both in terms of cost and delay.
In this paper, we investigate the problem of computing minimal interval and circular arc overlap representations, and give several optimal algorithms. We show that, among other things, given an s/spl times/t interval ...
详细信息
ISBN:
(纸本)0818674601
In this paper, we investigate the problem of computing minimal interval and circular arc overlap representations, and give several optimal algorithms. We show that, among other things, given an s/spl times/t interval or circular arc overlap representation matrix: a minimal interval overlap representation can be obtained with O(log(st)) time with O(st/log(st)) EREW PRAM processors, or in O(log t/log log t) time with O(st log log t/log t) common CRCW PRAM processors, or in O(1) time with O(st) BSR processors; a minimal circular arc overlap representation can be obtained in O(st) time.
Based on the current fiber optic technology, a new computational model, called a linear array with a reconfigurable pipelined bus system (LARPBS), is proposed in this paper. A parallel quicksort algorithm is implement...
详细信息
Based on the current fiber optic technology, a new computational model, called a linear array with a reconfigurable pipelined bus system (LARPBS), is proposed in this paper. A parallel quicksort algorithm is implemented on the model, and its time complexity is analyzed. For a set of N numbers, the quicksort algorithm reported in this paper runs in O(log/sub 2/N) average time on a linear array with a reconfigurable pipelined bus system of size N. Besides proposing a new algorithm on the model, some basic data movement operations involved in the algorithm are discussed. We believe that these operations can be used to design other parallel algorithms on the same model. Future research in this area is also identified in this paper.
Cost sensitive applications for parallel computing require system designs using commodity hardware. Off-the-shelf processing nodes have already been implemented in parallel systems. This article proposes the use of AT...
详细信息
Cost sensitive applications for parallel computing require system designs using commodity hardware. Off-the-shelf processing nodes have already been implemented in parallel systems. This article proposes the use of ATM (Asynchronous Transfer Mode) for interconnection networks. Because ATM was not designed as communication technology for parallel systems, some adaptation has to be done in order to meet the special requirements of parallel systems. This paper discusses advantages and drawbacks of this approach and shows solutions to adapt the ATM technology for usage in this special environment while preserving some unique features of ATM.
Virtual Reality (VR) is an exciting yet challenging area. In commercial VR systems, one of the main challenges is how to maintain relatively constant performance under various loading and at low-cost. This paper prese...
详细信息
Virtual Reality (VR) is an exciting yet challenging area. In commercial VR systems, one of the main challenges is how to maintain relatively constant performance under various loading and at low-cost. This paper presents a parallel and distributed solution to the problem against the background of a commercial entertainment VR system. In the paper, the architecture of the system is introduced. The strategies of distribution and the mechanism of the parallel processing is discussed.
An active object is a function that returns a pointer to its environment when an execution thread is attached to it. This facility of BaLinda K, a parallel Lisp dialect with an imperative appearance, is shown to be us...
详细信息
ISBN:
(纸本)0780335295
An active object is a function that returns a pointer to its environment when an execution thread is attached to it. This facility of BaLinda K, a parallel Lisp dialect with an imperative appearance, is shown to be useful for constructing I/O interfaces and execution control mechanisms, and has potential as a tool for system program implementation.
This paper presents performance and feasibility analyses for important mesh-connected architectures that contain sparse broadcast buses. Two basic architectures, that implement bus intersections differently, are given...
详细信息
This paper presents performance and feasibility analyses for important mesh-connected architectures that contain sparse broadcast buses. Two basic architectures, that implement bus intersections differently, are given special attention. The first architecture simply allows row/column bus crossovers. The second architecture has separable buses and implements such intersections with switches for further flexibility. Both architectures have lower cost than the mesh with multiple broadcast, which has buses spanning each row and each column, but the former architectures maintain to a high extent the powerful properties of the latter mesh. The architecture with separable buses is shown to often perform better than the higher-cost mesh with multiple broadcast. architectures with separable buses that employ store-and-forward routing often perform better than architectures with contiguous buses that employ high-cost wormhole routing. All these architectures are evaluated in reference to cost, and efficiency in implementing several important operations and application algorithms. The results prove that these architectures are very promising alternatives to the mesh with multiple broadcast; in addition, their implementation is cost-effective and feasible.
Broadcasting and gossiping are basic tools for high performance programming on parallel and distributed systems. Unfortunately they are known to be NP-hard problems. The paper deals with approximation algorithms far b...
详细信息
Broadcasting and gossiping are basic tools for high performance programming on parallel and distributed systems. Unfortunately they are known to be NP-hard problems. The paper deals with approximation algorithms far broadcasting and gossiping. The authors consider both round-complexity and step-complexity in the telephone model. After an overview of previously derived approximation algorithms, they present new strategies for broadcasting and gossiping in any networks. Broadcasting strategies are based on the construction of edge-disjoint spanning trees. Gossiping strategies are based on on-line computation of matching a slang with the gossiping process. The approximation algorithms for broadcasting offer almost optimal complexity when the number of messages to be broadcasted is large. They show that the best approximation algorithm for gossiping performs optimally in many cases. They also show experimentally that it performs faster than the best known hand-made algorithms in some particular cases.
暂无评论