This paper presents a checkpoint and recovery (C&R) protocol to support fault-tolerance for PVM (parallel Virtual Machine). The protocol helps to mask fail-stop failures from an application. The C&R activities...
详细信息
ISBN:
(纸本)9780769529172;0769529178
This paper presents a checkpoint and recovery (C&R) protocol to support fault-tolerance for PVM (parallel Virtual Machine). The protocol helps to mask fail-stop failures from an application. The C&R activities are transparent and do not require any change in the PVM library nor operating system. In PVM, an application can change the number of processes during execution. This paper focuses on solving problems raised by the dynamic spawn and the asynchronous exit of tasks in PVM. The proposed protocol is a non-blocking one, so it reduces side-effect of checkpoint activities of original programs.
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide this facility, a checkpoint mechanism...
详细信息
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide this facility, a checkpoint mechanism is used to recover a failed parallel application rolling it back to an execution moment prior to occurrence of the failure. In this work we present a mechanism for managing checkpoint operations during the failures automatically. This mechanism records periodically the application's context, identifies failed nodes and restarts MPI processes on the remaining nodes, allowing the continuity of the application and taking advantage of the computing accomplished previously. We describe a lot of changes inside source of the LAM/MPI. Experiments with an application for recognizing DNA similarity showed that despite the overhead caused by periodic checkpoints, the benefits can reach about 50% on a small cluster.
As computing breaches petascale limits both in processor performance and storage capacity, the only way that current and future gains in performance can be achieved is by increasing the parallelism of the system. Gain...
详细信息
As computing breaches petascale limits both in processor performance and storage capacity, the only way that current and future gains in performance can be achieved is by increasing the parallelism of the system. Gains in storage performance remain low due to the use of traditional distributed file systems such as NFS, where although multiple clients can access files at the same time, only one node can serve files to the clients. New file systems that distribute load across multiple data servers are being developed; however, most implementations concentrate all the metadata load at a single server still. Distributing metadata load is important to accommodate growing numbers of more powerful clients. Scaling metadata performance is more complex than scaling raw I/O performance, and with distributed metadata the complexity increases further. In this paper we present strategies for file creation in distributed metadata file systems. Using the PVFS distributed file system as our testbed, we present designs that are able to reduce the message complexity of the create operation and increase performance. Compared to the basecase create protocol implemented in PVFS, our design delivers near constant operation latency as the system scales, does not degenerate under high contention situations, and increases throughput linearly as the number of metadata servers increase. The design schemes are applicable to any distributed file system implementation.
Dependability evaluation is an important, often indispensable, step in (critical) systems design and analysis processes. The introduction of control and/or computing systems to automate processes increases the overall...
详细信息
Dependability evaluation is an important, often indispensable, step in (critical) systems design and analysis processes. The introduction of control and/or computing systems to automate processes increases the overall system complexity and therefore has an impact in terms of dependability. Moreover it is of interest to evaluate redundancy and maintenance policies. In those cases it is not possible to recur to notations as reliability block diagrams (RBD), fault trees (FT) or reliability graphs (RG) to represent the system, since the statistical independence assumption is not satisfied. Also more enhanced formalisms as dynamic FT (DFT) could result not adequate to the goal. To overcome those problems we developed a new formalism derived from RBD: the dynamic RBD (DRBD). In this paper we explain how to use the DRBD notation in system modeling and analysis, coming inside a methodology that, starting from the system structure, drives to the overall system availability evaluation following modeling and analysis phases. To do this we use an example drawn from literature consisting of a multiprocessor distributed computing system, also comparing our approach with the DFT one.
A Reconfigurable Consistency Algorithm (RCA) is an algorithm that guarantees the consistency in distributed Shared Memory (DSM) systems. In a RCA, there is a Configuration Control Layer (CCL) that is responsible for s...
详细信息
A Reconfigurable Consistency Algorithm (RCA) is an algorithm that guarantees the consistency in distributed Shared Memory (DSM) systems. In a RCA, there is a Configuration Control Layer (CCL) that is responsible for selecting the most suitable RCA configuration (behavior) for a specific workload and DSM system. In previous works, we defined an upper bound performance for RCA based on an ideal CCL, which knows apriori the best configuration for each situation. This ideal CCL is based on a set of workloads characteristics that, in most situations, are difficult to extract from the applications (percentage of shared write and read operations and sharing patterns). In this paper we propose, develop and present a heuristical configuration control mechanism for the CCL implementation. This mechanism is based on an easily obtained applications parameter, the concurrency level. Our results show that this configuration control mechanism improves the RCA performance in 15%, on average, compared to other traditional consistency algorithms. Furthermore, the CCL with this mechanism is independent from the workload and DSM system specific characteristics, like sharing patterns and percentage of writes and reads.
This study examined the interplay among processor speed, cluster interconnect and file I/O, using parallel applications to quantify interactions. We focused on a common case where multiple compute nodes communicate wi...
详细信息
This study examined the interplay among processor speed, cluster interconnect and file I/O, using parallel applications to quantify interactions. We focused on a common case where multiple compute nodes communicate with a single master node for file accesses. We constructed a predictive model that used time characteristics critical for application performance to estimate the number of nodes beyond which further performance improvement became unattainable. Predictions were experimentally validated with NAMD, a representative parallel application designed for molecular dynamics simulation. Such predictions can help guide decision making to improve machine allocations for parallel codes in large clusters.
Performance tuning involves a diagnostic process to locate and explain sources of program inefficiency. A performance diagnosis system can leverage knowledge of performance causes and symptoms that come from expertise...
详细信息
Performance tuning involves a diagnostic process to locate and explain sources of program inefficiency. A performance diagnosis system can leverage knowledge of performance causes and symptoms that come from expertise with parallel computational models. This paper extends our model-based performance diagnosis approach to programs with multiple models. We study two types of model compositions (nesting and restructuring) and demonstrate how the Hercule performance diagnosis framework can automatically discover and interpret performance problems due to model nesting in the FLASH application.
This paper presents a new approach for the execution of coarse-grain (tiled) parallel SPMD code for applications derived from the explicit discretization of 1-dimensional PDE problems with finite-differencing schemes....
详细信息
This paper presents a new approach for the execution of coarse-grain (tiled) parallel SPMD code for applications derived from the explicit discretization of 1-dimensional PDE problems with finite-differencing schemes. Tiling transformation is an efficient loop transformation to achieve coarse-grain parallelism in such algorithms, while rectangular tile shapes are the only feasible shapes that can be manually applied by program developers. However, rectangular tiling transformations are not always valid due to data dependencies, and thus requiring the application of an appropriate skewing transformation prior to tiling in order to enable rectangular tile shapes. We employ cyclic mapping of tiles to processes and propose a method to determine an efficient rectangular tiling transformation for a fixed number of processes for 2-dimensional, skewed PDE problems. Our experimental results confirm the merit of coarse-grain execution in this family of applications and indicate that the proposed method leads to the selection of highly efficient tiling transformations.
The proceedings contain 98 papers. The topics discussed include: alternative dataflow model;MOOSS2: a CPU with support for HLL memory structures;architecting security: a secure implementation of hardware buffer-overfl...
ISBN:
(纸本)9780889866560
The proceedings contain 98 papers. The topics discussed include: alternative dataflow model;MOOSS2: a CPU with support for HLL memory structures;architecting security: a secure implementation of hardware buffer-overflow protection;design of a two-level hot path Detector for path-based loop optimizations;write-aware buffer cache management scheme for nonvolatile RAM;equivalence checking in C-based system-level design by sequentializing concurrent behaviors;formal equivalence checking for loop optimization in C programs without unrolling;design and simulation for three SEU immune latches in a 0.18μm CMOS commercial process;redesign the 4:2 compressor for partial product reduction;using advanced transaction and workflow models in composing Web services;and assessing the traffic ranking of Websites with artificial neural network.
The motivation of our work is to make a design tool for distributed embedded systems compliant with HIS and AUTOSAR. The tool is based on Processor Expert, a component oriented development environment supporting sever...
详细信息
The motivation of our work is to make a design tool for distributed embedded systems compliant with HIS and AUTOSAR. The tool is based on Processor Expert, a component oriented development environment supporting several hundreds of microcontrollers, and Matlab simulink which is the de-facto standard in the rapid prototyping of the control applications but it does not have an adequate HW support. The objective is to provide an integrated development environment for embedded controllers having distributed nature and real-time requirements. Therefore we discuss the advantages of using an automatically generated code in the development cycle of the control embedded software. We present a developed block set and processor expert real-time target for Matlab real-time workshop embedded coder. The case study shows a development cycle for a servo control design.
暂无评论