Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there...
详细信息
Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there are several design decisions concerning the placement of checkpoint files that can impact the performance and functionality of coordinated checkpointers. Although several such checkpointers have been implemented for popular programming platforms like PVM and MPI, none have taken this issue into consideration. This paper addresses the issue of checkpoint placement and its impact on the performance and functionality of coordinated checkpointing systems. Several strategies, both old and new, are described and implemented on a network of SPARC-5 workstations running PVM. These strategies range from very simple to more complex borrowing heavily from ideas in RAID (Redundant Arrays of Inexpensive Disks) fault-tolerance. The results of this paper will serve as a guide so that future implementations of coordinated checkpointing can allow their users to achieve the combination of performance and functionality that is right for their applications.
Poor single event upset (SEU) and single event latchup (SEL) immunity are of major concern in high speed RF phase lock loops (PLLs) incorporated in many of current commercial satellites. As a result, greater demands a...
详细信息
Poor single event upset (SEU) and single event latchup (SEL) immunity are of major concern in high speed RF phase lock loops (PLLs) incorporated in many of current commercial satellites. As a result, greater demands are placed at the system level to compensate for this. These include reloading programming every clock cycle, parallel interfaces and redundancy, which result in increased size, weight, complexity and power. We present in this paper a 1.1 Ghz integer N PLL which is inherently SEL immune, has SEU rates less than 10/sup -9/ errors/bit-day (orders of magnitude better than currently available), excellent phase noise performance and standby current up to 100 krads(Si) total dose. This part is currently being manufactured on Peregrine Semiconductor's 0.8 /spl mu/m ultra thin silicon on sapphire UTSi/sup R/ process.
Peachy parallel Assignments are high-quality assignments for teaching parallel and distributed computing. They are selected competitively for presentation at the Edu* workshops. All of the assignments have been succes...
详细信息
ISBN:
(数字)9781665422963
ISBN:
(纸本)9781665404495
Peachy parallel Assignments are high-quality assignments for teaching parallel and distributed computing. They are selected competitively for presentation at the Edu* workshops. All of the assignments have been successfully used in class and they are selected based on the their ease of adoption by other instructors and for being cool and inspirational to students. This paper presents a paper-and-pencil assignment asking students to analyze the performance of different system configurations and an assignment in which students parallelize a simulation of the evolution of simple living organisms.
In this paper, we analyze the performance of the floating point digital signal processor (DSP) TMS320C6711 for an implementation of video coding motion. Two relevant motion estimation techniques were implemented: BMA ...
详细信息
In this paper, we analyze the performance of the floating point digital signal processor (DSP) TMS320C6711 for an implementation of video coding motion. Two relevant motion estimation techniques were implemented: BMA (block matching algorithm) and BMGT (block matching using geometric transforms). These have been combined with fast block matching algorithms to speed up the process. In order to increase the DSP performance, we have optimized some programming mechanisms like: the level of code parallelism, hand designed assembly code and an efficient usage of internal memory as cache. This implementation has shown that real-time motion estimation of BMA type, can be implemented in this DSP. However. BMGT type motion estimation cannot be done by one DSP alone in-real time applications, due to its high computational complexity.
This tutorial provides an opportunity to experiment with a new language designed to support the safe, secure, and productive development of parallel programs. ParaSail is a new language with pervasive parallelism coup...
详细信息
ISBN:
(纸本)9781450310284
This tutorial provides an opportunity to experiment with a new language designed to support the safe, secure, and productive development of parallel programs. ParaSail is a new language with pervasive parallelism coupled with extensive compile-time checking of annotations in the form of assertions, preconditions, postconditions, etc. ParaSail does all checking at compile time, and eliminates race conditions, null dereferences, uninitialized data access, numeric overflow, out of bounds indexing, etc. as well as statically checking the truth of all user-written assertions. After a short introduction to the language, attendees will receive a prototype ParaSail compiler and an accompanying ParaSail Virtual Machine interpreter for writing and testing ParaSail programs. The tutorial/workshop will finish with a group discussion and feedback on the experience of using this new language.
Pervasive Grid Computing Platforms include centralized computing nodes (e. g. parallel servers) as well as decentralized and mobile devices. Pervasive Grid applications include data- and computing-intensive components...
详细信息
Pervasive Grid Computing Platforms include centralized computing nodes (e. g. parallel servers) as well as decentralized and mobile devices. Pervasive Grid applications include data- and computing-intensive components which can be mapped also onto decentralized and mobile nodes. The effective and practical success of this mapping resides also in deriving proper configurations of applications which consider the limited memory capabilities of those resources. In this paper we target this issue by showing how we can study and configure the memory requirements of an Emergency Management application. We present our solutions by using the ASSISTANT programming model for Pervasive Grid applications.
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems-e.g., logic-based and constraint-based systems. The idea of inc...
详细信息
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems-e.g., logic-based and constraint-based systems. The idea of incremental stack-copying is to only copy the difference between the data areas of two agents, instead of copying them entirely, when distributing parallel work. In order to further reduce the communication during stack-copying and make its implementation efficient on message-passing platforms, a new technique, called stack-splitting, has recently been proposed. In this paper, we describe a scheme to effectively combine stack-splitting with incremental stack copying, to achieve superior parallel performance in a non-shared memory environment. We also describe a scheduling scheme for this incremental stack-splitting strategy. These techniques are currently being implemented in the PALS system-a parallel constraint logic programming system.
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM im...
详细信息
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. This work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. The analysis covers basic characteristics such as transaction length, distribution of read-set and write-set size, and the frequency of nesting and I/O operations. The measured characteristics provide key insights into the design of efficient TM systems for both non-blocking synchronization and speculative parallelization.
Run-time errors in concurrent programs are generally due to the wrong usage of synchronization primitives such as monitors. Conventional validation techniques such as testing become ineffective for concurrent programs...
详细信息
ISBN:
(纸本)9781581135626
Run-time errors in concurrent programs are generally due to the wrong usage of synchronization primitives such as monitors. Conventional validation techniques such as testing become ineffective for concurrent programs since the state space increases exponentially with the number of concurrent processes. In this paper, we propose an approach in which 1) the concurrency control component of a concurrent program is formally specified, 2) it is verified automatically using model checking, and 3) the code for concurrency control component is automatically generated. We use monitors as the synchronization primitive to control access to a shared resource by multipleconcurrent processes. Since our approach decouples the concurrency control component from the rest of the implementation it is scalable. We demonstrate the usefulness of our approach by applying it to a case study on Airport Ground Traffic *** use the Action Language to specify the concurrency control component of a system. Action Language is a specification language for reactive software systems. It is supported by an infinite-state model checker that can verify systems with boolean, enumerated and udbounded integer variables. Our code generation tool automatically translates the verified Action Language specification into a Java monitor. Our translation algorithm employs symbolic manipulation techniques and the specific notification pattern to generate an optimized monitor class by eliminating the context switch overhead introduced as a result of unnecessary thread notification. Using counting abstraction, we show that we can automatically verify the monitor specifications for arbitrary number of threads.
The first version of the MPI standard was released in November 1993. At the time, many of the authors of this standard, myself included, viewed MPI as a temporary solution, to be used until it is replaced by a good pr...
详细信息
ISBN:
(纸本)9781450320658
The first version of the MPI standard was released in November 1993. At the time, many of the authors of this standard, myself included, viewed MPI as a temporary solution, to be used until it is replaced by a good programming language for distributed memory systems. Almost twenty years later, MPI is the main programming model for High-Performance Computing, and practically all HPC applications use MPI, which is now in its third generation; nobody expects MPI to disappear in the coming decade. The talk will discuss some plausible reasons for this situation, and the implications for research on new programming models for Extreme-Scale Computing.
暂无评论