In this paper we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, ha...
详细信息
In this paper we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, hand-coded software DSM and hand-coded message passing. For this comparison, we use six applications: four that are regular and two that are irregular: Our results are gathered on an 8-node IBM SP/2 using the TreadMarks software DSM system. We use the APR shared-memory (SPF) compiler to generate the shared memory-programs and the APR XHPF compiler to generate message passing programs. The hand-coded message passing programs run with the IBM PVMe optimized message passing library. On the regular programs, both the compiler-generated and the hand-coded message passing outperform the SPF/TreadMarks combination: the compiler-generated message passing by 5.5% to 40%, and the hand-coded message passing by 7.5% to 49%. On the irregular programs, the SPF/TreadMarks combination outperforms the compiler-generated message passing by 38% and 89%, and only slightly underperforms the hand-coded message passing, differing by 4.4% and 16%. We also identify the factors that account for the performance differences, estimate their relative importance, and describe methods to improve the performance.
Virtual Reality (VR) is an exciting yet challenging area. Especially in commercial VR systems, one of the main challenges is how to maintain relatively constant performance under various loading and at low-cost. This ...
详细信息
Virtual Reality (VR) is an exciting yet challenging area. Especially in commercial VR systems, one of the main challenges is how to maintain relatively constant performance under various loading and at low-cost. This paper presents a parallel anddistributed solution to the problem under the background of a commercial entertainment VR system. In the paper, the architecture of the system is introduced. The strategies of distribution and the mechanism of the parallelprocessing is discussed.
The proceedings contains 92 papers from the 1996 internationalsymposium on parallel Architectures, Algorithms and Networks. Topics discussed include: massively parallel processors;distributed memory parallel computer...
详细信息
The proceedings contains 92 papers from the 1996 internationalsymposium on parallel Architectures, Algorithms and Networks. Topics discussed include: massively parallel processors;distributed memory parallel computers;multistage interconnection networks;Banyan switching fabrics;internetworking;transmission control protocol/Internet protocol networks;train traffic and event driven simulations;universal broadband network access devices;customer premises networks;andparallel random access machines.
This paper describes the architecture and implementation of Libra, a library for implementing efficient reliable distributedapplications. Libra is designed to provide fault-tolerance transparency and a simple, easy t...
详细信息
ISBN:
(纸本)0780335295
This paper describes the architecture and implementation of Libra, a library for implementing efficient reliable distributedapplications. Libra is designed to provide fault-tolerance transparency and a simple, easy to use high-level message passing interface so that the development of reliable distributedapplications can be significantly simplified Fault tolerance is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. By employing novel mechanisms, Libra minimises communication overhead for taking a consistent distributed checkpoint and catching messages in transit. With efficient implementation techniques, the prototype of Libra has been implemented on a network of Sun workstations, and supports reliable distributed computing at low run-time cost. The simplicity and efficiency of Libra make ita promising approach to construct reliable distributedapplications.
Cost sensitive applications for parallel computing require system designs using commodity hardware. Off-the-shelf processing node have already been implemented in parallel systems. This article proposes the use of ATM...
详细信息
ISBN:
(纸本)0818674601
Cost sensitive applications for parallel computing require system designs using commodity hardware. Off-the-shelf processing node have already been implemented in parallel systems. This article proposes the use of ATM (Asynchronous Transfer Mode) for interconnection networks. Because ATM was not designed as communication technology for parallel systems, some adaptation has to be done in order to meet the special requirements of parallel systems. This paper discusses advantages and drawbacks of this approach and shows solutions to adapt the ATM technology for usage in this special environment while preserving some unique features of ATM.
The main objective of the distributed scheduler agent is to provide task placement advice either to parallel anddistributedapplications directly, or to a distributed scheduler which will despatch normal applications...
详细信息
The main objective of the distributed scheduler agent is to provide task placement advice either to parallel anddistributedapplications directly, or to a distributed scheduler which will despatch normal applications transparently to the advised hosts. To accomplish this, the distributed scheduler agent needs to know the global load situation across all machines and be able to pick the host which suits the specific resource requirements of individual jobs best. Issues concerning the collecting and distribution of load information throughout the system are discussed. This load information is then fed to a ranking algorithm which uses a 3-dimensional load space to generate the most suitable host based on weights which indicate the relative importance of resources to a task. Performance tests are carried out to determine the response times and overhead of the distributed scheduler agent. An application, a distributed ray tracer, is also customised to make use of the distributed scheduler agent and the results presented.
Voting on large collections of input objects is becoming increasingly important in data fusion, signal and image processing, anddistributed computing. To achieve high speed in voting, the multiple processing resource...
详细信息
ISBN:
(纸本)0780335295
Voting on large collections of input objects is becoming increasingly important in data fusion, signal and image processing, anddistributed computing. To achieve high speed in voting, the multiple processing resources typically available in such applications should be utilized;hence the need for parallel voting algorithms. We develop efficient parallel algorithms for threshold voting which generalize and extend previous work on both sequential threshold voting andparallel majority voting. We show how a well-known O(n)-time sequential algorithm for m-out-of-n voting can be parallelized through a simple divide-and-conquer strategy. When m = theta(n), the resulting algorithm has O(log(2) n) time complexity on PRAM and hypercube computers and optimal O(n(1/k)) complexity on a k-dimensional mesh-connected architecture. We also analyze the time complexity of the algorithm in the case of m = o(n) and for certain weighted threshold voting schemes.
Various tridiagonal solvers have been proposed in recent years for different parallel platforms. In this paper, the performance of three tridiagonal solvers, namely, the parallel partition LU algorithm, the parallel d...
详细信息
Various tridiagonal solvers have been proposed in recent years for different parallel platforms. In this paper, the performance of three tridiagonal solvers, namely, the parallel partition LU algorithm, the parallel diagonal dominant algorithm, and the reduced diagonal dominant algorithm, is studied. These algorithms are designed for distributed-memory machines and are tested on an Intel Paragon and an IBM SP2 machines. Measured results are reported in terms of execution time and speedup. The measured results match analytical results closely. In addition to address implementation issues, performance considerations such as problem sizes and models of speedup are also discussed.
distributed computing systems are widely used in mission-critical real-time applications like missile defense systems, aircraft control and sonar applications. Designing a low cost distributed computing system which s...
详细信息
ISBN:
(纸本)0818676140
distributed computing systems are widely used in mission-critical real-time applications like missile defense systems, aircraft control and sonar applications. Designing a low cost distributed computing system which satisfies all the stringent requirements of a given application is a difficult problem. This problem can be alleviated using Computer-Aided Synthesis (CAS) tools. Due to the large number of design alternatives, the CAS tools are compute intensive and can take a considerably long time even for medium sized real-time applications. In this paper, we describe a set of parallel synthesis algorithms which dynamically adapt to the number of available processors in a parallel computer system to substantially reduce the total turn-around time of the synthesis process.
Networks of workstations and high-performance microcomputers have been rarely used for running high-performance applications like multimedia, simulations, scientific and engineering applications, because, although the...
详细信息
Networks of workstations and high-performance microcomputers have been rarely used for running high-performance applications like multimedia, simulations, scientific and engineering applications, because, although they have significant aggregate computing power, they lack the support for efficient message-passing and shared-memory communication. In this paper we present Telegraphos, a distributed system that provides efficient shared-memory support on top of a workstation cluster. We focus on the network interface of Telegraphos that provides a variety of shared-memory operations like remote reads, remote writes, remote atomic operations, all launched from user level without any intervention of the operating system. Telegraphos I, the first Telegraphos prototype has been implemented. Emphasis was put on rapid prototyping, so the technology used was conservative: FPGA's, SRAM's, and TTL buffers. Telegraphos II, is the single-chip version of the Telegraphos architecture;its switch was implemented and its network interface is being debugged.
暂无评论