Current virtual server-based load balancing schemes for DHT have been shown to be able to achieve excellent load balancing effectiveness. However they face two important issues. They suffer from problems of incurring ...
详细信息
ISBN:
(纸本)9780769534718
Current virtual server-based load balancing schemes for DHT have been shown to be able to achieve excellent load balancing effectiveness. However they face two important issues. They suffer from problems of incurring extremely high overheads, and inducing severe inconsistency in DHT routing state. We present two fundamental components, virtual server management and active stabilization, whose inclusion into these schemes essentially eliminates these problems. As a result, these schemes not only incur overheads comparable to non-virtual server-based systems, but also achieve better query performance.
We propose a family of regular Cayley network graphs of degree three based on permutation groups for design of massively parallel systems. These graphs are shown to be based on the shuffle exchange operations, to have...
详细信息
We propose a family of regular Cayley network graphs of degree three based on permutation groups for design of massively parallel systems. These graphs are shown to be based on the shuffle exchange operations, to have logarithmic diameter in the number of vertices, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose a simple routing algorithm. These graphs are shown to be able to efficiently simulate other permutation group based graphs;thus they seem to be very attractive for VLSI implementation and for applications requiring bounded number of I/O ports as well as to run existing applications for other permutation group based architectures.
The distributed time-triggered simulation (DTS) scheme is a new type of an approach to real-time simulation based on parallel / distributed computing. It requires a global time base which provides consistent real-time...
详细信息
ISBN:
(纸本)0769523129
The distributed time-triggered simulation (DTS) scheme is a new type of an approach to real-time simulation based on parallel / distributed computing. It requires a global time base which provides consistent real-time information to application software running on distributed nodes. The exploitation of its full potential requires advanced computing platforms such as highly parallel computing platforms. Recent developments in the scientific foundation for DTS and the execution engines and programming tools supporting DTS are briefly reviewed.
Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets ...
详细信息
ISBN:
(纸本)9781479941162
Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets are so large that the ingest and merge job phases are now performance bottlenecks. In this paper, we mitigate the ingest and merge bottlenecks by leveraging the scale-up MapReduce model. We introduce an ingest chunk pipeline and a merge optimization that increases CPU utilization (50 - 100%) and job phase speedups (1.16x - 3.13x) for the ingest and merge phases. Our techniques are based on well-known algorithms and scale-out MapReduce optimizations, but applying them to a scale-up computation framework to mitigate the ingest and merge bottlenecks is novel.
We consider parallel triangular solver on a distributed memory MIMD computer. Three task partitioning methods will be discussed with both task assignation and task schedule. Their estimated times will be provided by u...
详细信息
We consider parallel triangular solver on a distributed memory MIMD computer. Three task partitioning methods will be discussed with both task assignation and task schedule. Their estimated times will be provided by using a performance model and a methodology of parallel performance evaluation. The optimal task granularities will be deduced by their performance analysis. Experiences on a transputer-based multicomputer will be given.
This paper presents the results of performance analysis of a seismic analysis kernel code on the KSR multiprocessors. The purpose of such analysis is to understand the performance behaviors of a class of applications ...
详细信息
This paper presents the results of performance analysis of a seismic analysis kernel code on the KSR multiprocessors. The purpose of such analysis is to understand the performance behaviors of a class of applications on shared memory parallel machines. The g5 kernel code, commonly used in seismic analysis applications, is parallelized, and its computational and I/O performance is analyzed on a 32-node KSR-1 and a 64-node KSR-2.
Performance penalties due to synchronization are a common concern in parallel programming. Traditional approaches enforce the correct ordering of write operations using locks, but this can be time-consuming and drasti...
详细信息
Performance penalties due to synchronization are a common concern in parallel programming. Traditional approaches enforce the correct ordering of write operations using locks, but this can be time-consuming and drastically reduce the benefits of using a parallel machine. Instead, for certain classes or programs we propose using an optimistic approach where the solution is calculated without any locks. This approach detects data races by maintaining statistics on memory writes and correcting potentially inappropriate data values by repeating selected computations and write operations. This scheme is evaluated with a novel parallel implementation of the Moller-Plesset perturbation theory energy calculation for closed-shell molecules.
Despite the enormous amount of research and development work in the area of parallel computing, it is a common observation that simultaneous performance and ease-of-use are elusive. We believe that ease-of-use is crit...
详细信息
ISBN:
(纸本)0769519652
Despite the enormous amount of research and development work in the area of parallel computing, it is a common observation that simultaneous performance and ease-of-use are elusive. We believe that ease-of-use is critical for many end users, and thus seek performance enhancing techniques that can be easily retrofitted to existing parallel applications. In a previous paper we have presented MPI process swapping, a simple add-on to the MPI programming environment that can improve performance in shared computing environments. MPI process swapping requires as few as three lines of source code change to an existing application. In this paper we explore a question that we had left open in our previous work: based on which policies should processes be swapped for best performance? Our results show that, with adequate swapping policies, MPI process swapping can provide substantial performance benefits with very limited implementation effort.
High complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing, To deal with the complexity of software development, abstractions such...
详细信息
ISBN:
(纸本)0818681187
High complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing, To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications. A parallel programming system, called DPnDP (Design Patterns and distributed Processes), that employs such design patterns is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of our approach a's in the use of a standard structure and interface for a design pattern. This has several important Implications: First, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (Extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and efficient parallel programming toot (Flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken pla...
详细信息
ISBN:
(纸本)0818675829
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken place slower than corresponding increase in processor speeds. It is therefore necessary to model I/O delays and evaluate performance benefits of moving an application to a better multiprocessor system. In this work, we perform such an analysis by measuring I/O delays for a synthesized application that uses paralleldistributed File System. The aim of this study was to evaluate the performance benefits of better disks in a multiprocessor system which was designed few years back. We report how the I/O performance would get affected if an application were to be run on a system which would have better disks and communication links. In this study, we show a substantial improvement in the performance of I/O system with better disks and communication links with respect to the existing system.
暂无评论