Current parallelizing compilers for message-passing machines only support a limited class of data-parallelapplications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing c...
详细信息
Current parallelizing compilers for message-passing machines only support a limited class of data-parallelapplications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed-shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: 1) combine synchronization and parallelism information communication on parallel task invocation, 2) employ customized routines for evaluating reduction operations, and 3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good eight processor speedups on an IBM SP-2 and DEC Alpha cluster, usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Flushing updates, in particular, eliminates almost all nonlocal memory misses and improves performance by 13% on average.
the purpose of adaptive fault-tolerance (AFT) is to meet the dynamically and widely changing fault-tolerance requirement by efficiently and adaptively utilizing a limited and dynamically changing amount of available r...
详细信息
the purpose of adaptive fault-tolerance (AFT) is to meet the dynamically and widely changing fault-tolerance requirement by efficiently and adaptively utilizing a limited and dynamically changing amount of available redundant processing resources. In this paper we present one concrete AFT scheme, named the adaptable distributed recovery block (ADRB) scheme, which is an extension of the distributed Recovery Block (DRB) scheme for reliable execution of real-time applications withthe tolerance of both hardware and software faults in distributed/parallel computer systems. An ADRB station dynamically switches its operating mode in response to significant changes in the resource and application modes. Different operating modes have different resource requirements and yield different fault tolerance capabilities. A modular implementation model for the ADRB scheme is also presented. An efficient execution support mechanism for the ADRB scheme has been implemented as a part of a timeliness-guaranteed kernel developed at the University of California, Irvine.
Current parallelizing compilers for message-passing machines only support a limited class of data-parallelapplications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing c...
详细信息
Current parallelizing compilers for message-passing machines only support a limited class of data-parallelapplications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: (1) combine synchronization and parallelism information communication on parallel task invocation, (2) employ customized routines for evaluating reduction operations, and (3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good eight processor speedups on an IBM SP-2 and DEC Alpha cluster usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Flushing updates, in particular, eliminates almost all nonlocal memory misses and improves performance by 13% on average.
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallelapplications on nonuniform and adaptive environments. the approach presented is reasonably general and is app...
详细信息
ISBN:
(纸本)0818675829
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallelapplications on nonuniform and adaptive environments. the approach presented is reasonably general and is applicable to a wide variety of regular as well as irregular applications. We present performance results for the solution of an unstructured mesh on a cluster of heterogeneous workstations.
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applicationsthat perform well on ...
详细信息
ISBN:
(纸本)0818675829
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applicationsthat perform well on a DCPC are coarse-grain applicationsthat involve large amounts of file I/O. Current research in parallel file systems for distributed systems is providing a mechanism for adapting these applications to the DCPC environment. We present the parallel Virtual File System (PVFS), a system that provides disk striping across multiple nodes in a distributedparallel computer and file partitioning among tasks in a parallel program. PVFS is unique among similar systems in that it uses a streams-based approach that represents each file access with a single set of request parameters and decouples the number of network messages from details of the files striping and partitioning. PVFS also provides support for efficient collective file accesses and allows overlapping file partitions. We present results of early performance experiments that show PVFS achieves excellent speedups in accessing moderately sized file segments.
this paper presents the 'Planned Direct Transfer' programming model, developed by Mercury Computer Systems to meet the requirements of embedded high-performance computing applications. In this model, data tran...
详细信息
ISBN:
(纸本)0818672552
this paper presents the 'Planned Direct Transfer' programming model, developed by Mercury Computer Systems to meet the requirements of embedded high-performance computing applications. In this model, data transfers are 'Planned' before they occur, resulting in low software overhead execution;they are also 'Direct' - they do not require intermediate data copying. this paper locates the Planned Direct Transfer (PDT) model in the landscape of the standard approaches of Shared Memory and Message Passing.
Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems - latency and scalability. In ...
详细信息
ISBN:
(纸本)0818672552
Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems - latency and scalability. In this paper, we propose an implementation of blocking locks using scheduler adaptation which exploits the interaction between thread schedulers and locks. By experimentation using well-known multiprocessor applications on a KSR2 multiprocessor, we demonstrate how such an implementation considerably reduces the latency and improves the scalability of blocking locks.
TOP-C is a task-oriented parallel C interface. It presents a master-slave task architecture that greatly eases the parallelization of code. It is intended for applications where a compiler would have difficulty recogn...
详细信息
ISBN:
(纸本)0818675829
TOP-C is a task-oriented parallel C interface. It presents a master-slave task architecture that greatly eases the parallelization of code. It is intended for applications where a compiler would have difficulty recognizing opportunities for data-parallelism. the model has been implemented for both shared memory processors and networks of workstations. there is also a sequential version useful during development, which runs the same application code. Ease-of-use has been a strong motivation behind its design. For this reason, TOP-C is organized in a SPMD style, with one primary subroutine call to invoke it. Its main features are: (a) task-parallelism, (b) a single shared, global data structure, and (c) restricted master-slave communication.
A model for virtual memory in a distributed memory parallel computer is proposed. It uses a novel parallel computing operating system framework and leads to the definition of two strategies for implementing parallel v...
详细信息
A model for virtual memory in a distributed memory parallel computer is proposed. It uses a novel parallel computing operating system framework and leads to the definition of two strategies for implementing parallel virtual memory. Careful analysis and simulation results indicate that dynamic page allocation performs better for applicationsthat exhibit some locality of reference of public data and for applications whose data space does not fit in the physical memory available. Static page allocation is more efficient in cases of poor locality and small data space (no virtual memory needed).
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken pla...
详细信息
ISBN:
(纸本)0818675829
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken place slower than corresponding increase in processor speeds. It is therefore necessary to model I/O delays and evaluate performance benefits of moving an application to a better multiprocessor system. In this work, we perform such an analysis by measuring I/O delays for a synthesized application that uses paralleldistributed File System. the aim of this study was to evaluate the performance benefits of better disks in a multiprocessor system which was designed few years back. We report how the I/O performance would get affected if an application were to be run on a system which would have better disks and communication links. In this study, we show a substantial improvement in the performance of I/O system with better disks and communication links with respect to the existing system.
暂无评论