Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there...
详细信息
Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there are several design decisions concerning the placement of checkpoint files that can impact the performance and functionality of coordinated checkpointers. Although several such checkpointers have been implemented for popular programming platforms like PVM and MPI, none have taken this issue into consideration. This paper addresses the issue of checkpoint placement and its impact on the performance and functionality of coordinated checkpointing systems. Several strategies, both old and new, are described and implemented on a network of SPARC-5 workstations running PVM. These strategies range from very simple to more complex borrowing heavily from ideas in RAID (Redundant Arrays of Inexpensive Disks) fault-tolerance. The results of this paper will serve as a guide so that future implementations of coordinated checkpointing can allow their users to achieve the combination of performance and functionality that is right for their applications.
In this paper we present a parallel band selection approach, referred to as parallel simulated annealing band selection (PSABS), for hyperspectral imagery. The approach is based on the simulated annealing band sele...
详细信息
In this paper we present a parallel band selection approach, referred to as parallel simulated annealing band selection (PSABS), for hyperspectral imagery. The approach is based on the simulated annealing band selection (SABS) scheme. The SABS algorithm is originally designed to group highly correlated hyperspectral bands into a smaller subset of band modules regardless of the original order in terms of wavelengths. SABS selects sets of non-correlated hyperspectral bands based on simulated annealing (SA) algorithm and utilizes the inherent separability of different classes in hyperspectral images to reduce dimensionality. In order to be effective, the proposed PSABS is introduced to improve the computational speed by using parallel computing techniques. It allows multiple Markov chains (MMC) to be traced simultaneously and fully utilizes the significant parallelism embedded in SABS to create a set of PSABS modules on each parallel node implemented by the message passing interface (MPI) cluster-based library and the open multi-processing (OpenMP) multicore-based application programming interface. The effectiveness of the proposed PSABS is evaluated by MODIS/ASTER airborne simulator (MASTER) hyperspectral images for hyperspectral band selection during the PACRIM II campaign. The experimental results demonstrated that PSABS can significantly improve the computational loads and provide a more reliable quality of solution compared to the original SABS method.
Traditional software design methodologies have been shown to have drawbacks in designing and implementing software systems for robotics. A novel dual-hierarchical object-oriented design methodology is presented, which...
详细信息
Traditional software design methodologies have been shown to have drawbacks in designing and implementing software systems for robotics. A novel dual-hierarchical object-oriented design methodology is presented, which is well suited to problems of this type. A practical example of the application of this methodology is presented, utilizing CLOS as the implementation vehicle. The methodology developed is shown to facilitate the programming and planning of complex robot tasks, and the provision of generic recovery procedures for exception handling.< >
The conventional model of parallel programming today involves either copying data across cores (and then having to track its most recent value), or not copying and requiring deep software stacks to perform even the si...
详细信息
ISBN:
(纸本)9781665475075
The conventional model of parallel programming today involves either copying data across cores (and then having to track its most recent value), or not copying and requiring deep software stacks to perform even the simplest operation on data that is “remote”, i.e., out of the range of loads and stores from the current core. As application requirements grow to larger data sets, with more irregular access to them, both conventional approaches start to exhibit severe scaling limitations. This paper reviews some growing evidence of the potential value of a new model of computation that skirts between the two: data does not move (i.e., is not copied), but computation instead moves to the data. Several different applications involving large sparse computations, streaming of data, and complex mixed mode operations have been coded for a novel platform where thread movement is handled invisibly by the hardware. The evidence to date indicates that parallel scaling for this paradigm can be significantly better than any mix of conventional models.
In this paper, we describe PEGASUS, an open source peta graph mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the con...
详细信息
In this paper, we describe PEGASUS, an open source peta graph mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. as the size of graphs reaches several giga-, tera- or peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with ¿ 6,7 billion edges.
The problem of M activities of N > M parallel activities being adjusted to a procedure chain is one type of the project scheduling. In allusion to M = 3 , a new polynomial algorithm is proposed to minimize the tota...
详细信息
The problem of M activities of N > M parallel activities being adjusted to a procedure chain is one type of the project scheduling. In allusion to M = 3 , a new polynomial algorithm is proposed to minimize the total tardiness criterion. In order to present this algorithm, and search the optimal procedure chain, we propose a Normal Chain Theory by virtue of the relationships of activities' time parameters, together with the properties that the optimal chain contains the activities with the minimum of earliest finish time. By the analysis of this algorithm, we get the time complexity is O(N log N).
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems-e.g., logic-based and constraint-based systems. The idea of inc...
详细信息
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems-e.g., logic-based and constraint-based systems. The idea of incremental stack-copying is to only copy the difference between the data areas of two agents, instead of copying them entirely, when distributing parallel work. In order to further reduce the communication during stack-copying and make its implementation efficient on message-passing platforms, a new technique, called stack-splitting, has recently been proposed. In this paper, we describe a scheme to effectively combine stack-splitting with incremental stack copying, to achieve superior parallel performance in a non-shared memory environment. We also describe a scheduling scheme for this incremental stack-splitting strategy. These techniques are currently being implemented in the PALS system-a parallel constraint logic programming system.
The first version of the MPI standard was released in November 1993. At the time, many of the authors of this standard, myself included, viewed MPI as a temporary solution, to be used until it is replaced by a good pr...
详细信息
ISBN:
(纸本)9781450320658
The first version of the MPI standard was released in November 1993. At the time, many of the authors of this standard, myself included, viewed MPI as a temporary solution, to be used until it is replaced by a good programming language for distributed memory systems. Almost twenty years later, MPI is the main programming model for High-Performance Computing, and practically all HPC applications use MPI, which is now in its third generation; nobody expects MPI to disappear in the coming decade. The talk will discuss some plausible reasons for this situation, and the implications for research on new programming models for Extreme-Scale Computing.
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM im...
详细信息
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. This work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. The analysis covers basic characteristics such as transaction length, distribution of read-set and write-set size, and the frequency of nesting and I/O operations. The measured characteristics provide key insights into the design of efficient TM systems for both non-blocking synchronization and speculative parallelization.
暂无评论