Execution of MPI applications on clusters and grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category techniques have been introduced to make t...
详细信息
ISBN:
(纸本)9781424400546
Execution of MPI applications on clusters and grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category techniques have been introduced to make these systems fault-tolerant. The first one is checkpoint-based technique and the other one is called log-based recovery protocol. Sender-based pessimistic logging which falls in the second category is harnessing from huge amount of messages payloads which must be kept in volatile memory. In this paper, we present a coordinated checkpoint from message payload (CCMP) to reduce the aforementioned overhead. The proposed method was examined by MPICH-V2, a public domain platform implementing pessimistic logging with uncoordinated checkpoint. Experimental results demonstrated the reduction of run-time for NPB benchmarks in both fault-free and faulty environments.
In many distributed computing systems that are prone to either induced or spontaneous node failures, the number of available computing resources is dynamically changing in a random fashion. A load-balancing (LB) polic...
详细信息
ISBN:
(纸本)9781424400546
In many distributed computing systems that are prone to either induced or spontaneous node failures, the number of available computing resources is dynamically changing in a random fashion. A load-balancing (LB) policy for such systems should therefore be robust, in terms of workload re-allocation and effectiveness in task completion, with respect to the random absence and re-emergence of nodes as well as random delays in the transfer of workloads among nodes. In this paper two LB policies for such computing environments are presented: The first policy takes an initial LB action to preemptively counteract the consequences of random failure and recovery of nodes. The second policy compensates for the occurrence of node failure dynamically by transferring loads only at the actual failure instants. A probabilistic model, based on the concept of regenerative processes, is presented to assess the overall performance of the system under these policies. Optimal performance of both policies is evaluated using analytical, experimental and simulation-based results. The interplay between node-failure/recovery rates and the mean load-transfer delay are highlighted
Users and developers of grid applications have access to increasing numbers of resources. While more resources generally mean higher capabilities for an application, they also raise the issue of application scheduling...
详细信息
ISBN:
(纸本)9781424400546
Users and developers of grid applications have access to increasing numbers of resources. While more resources generally mean higher capabilities for an application, they also raise the issue of application scheduling scalability. First, even polynomial time scheduling heuristics may take a prohibitively long time to compute a schedule. second, and perhaps more critical, it may not be possible to gather all the resource information needed by a scheduling algorithm in a scalable manner. Our application focus is scientific workflows, which can be represented as directed acyclic graphs (DAGs). Our claim is that, in future resource-rich environments, simple scheduling algorithms may be sufficient to achieve good workflow performances. We introduce a scalable scheduling approach that uses a resource abstraction called a virtual grid (VG). Our simulations of a range of typical DAG structures and resources demonstrate that a simple greedy scheduling heuristic combined with the virtual grid abstraction is as effective and more scalable than more complex heuristic DAG scheduling algorithms on large-scale platforms
Summary form only given. Multicore processor technologies, which appear to dominate the processor design landscape, require a shift of paradigm in the development of programming models and supporting environments for ...
详细信息
ISBN:
(纸本)9781424400546
Summary form only given. Multicore processor technologies, which appear to dominate the processor design landscape, require a shift of paradigm in the development of programming models and supporting environments for scientific and engineering applications. System software for multicore processors needs to exploit fine-grain concurrent execution capabilities and cope with deep, non- uniform memory hierarchies. Software adaptation to multicore technologies needs to happen even as hardware platforms change underneath the software. Last but not least, due to the extremely high compute density of chip multiprocessing components, system software needs to increase its energy-awareness and treat energy and temperature distribution as first-class optimization targets. Unfortunately, energy awareness is most often at odds with high performance. In the first part of this talk, the author discusses some of the major challenges of software adaptation to multicore technologies and motivate the use of autonomic, self-optimizing system software, as a vehicle for both high performance portability and energy-efficient program execution. In the second part of the talk, the author presents ongoing research in runtime environments for dense parallelsystems built from multicore and SMT components, and focus on two topics, polymorphic multithreading, and power-aware concurrency control with quality-of-service guarantees. In the same context, the author discusses enabling technologies for improved software autonomy via dynamic runtime optimization, including continuous hardware profilers, and online power-efficiency predictors
Commercial database systems must typically rely on fast hardware platforms and interconnects to deal efficiently with data in parallel. However, cheap computing power can be applied for flexibility and scalability in ...
详细信息
ISBN:
(纸本)0769523129
Commercial database systems must typically rely on fast hardware platforms and interconnects to deal efficiently with data in parallel. However, cheap computing power can be applied for flexibility and scalability in managing large data volumes if the right choices are made concerning data placement and processing. Our work concentrates on the use of cheap computing power in possibly slow, non-dedicated local networks to achieve a computing power over demanding query-intensive databases that would be unachievable without expensive specialized hardware and massively parallelsystems. The Node Partitioned Data Management System (NPDM) works on computing nodes on non-dedicated local networks. In this paper we concentrate on query transformations required for efficient processing over a specialized query-intensive schema. The decision support benchmark TPC-H is used as a study case for the transformations and for experimental analysis.
We introduce a continuous convergence protocol for handling locally committed and possibly conflicting updates to replicated data. The protocol supports local consistency and predictability while allowing replicas to ...
详细信息
ISBN:
(纸本)0769523129
We introduce a continuous convergence protocol for handling locally committed and possibly conflicting updates to replicated data. The protocol supports local consistency and predictability while allowing replicas to deterministically diverge and converge as updates are committed and replicated. We discuss how applications may exploit the protocol characteristics and describe an implementation where conflicting updates are detected, qualified by a partial update order, and resolved using application-specific forward conflict resolution.
Real-time spatio-temporal VLSI 3D IIR digital filters may be used for imaging or beamforming applications employing 3D input signals from synchronously-sampled multi-sensor arrays. Such filters have high computational...
详细信息
ISBN:
(纸本)0780388348
Real-time spatio-temporal VLSI 3D IIR digital filters may be used for imaging or beamforming applications employing 3D input signals from synchronously-sampled multi-sensor arrays. Such filters have high computational complexity and often require arithmetic throughputs of hundreds of millions of floating point operations per second, especially in the case of potential radio frequency beamforming applications. A novel high-throughput distributedparallel processor (DPP) architecture is proposed that is suitable for on-chip real-time VLSI/FPGA direct-form 3D IIR digital filter implementations. Using the proposed architecture and Matlab/Simulink and Minx simulation software, the design and bit-level simulation of a first-order highly-selective FPGA-based 3D IIR Frequency-planar filter circuit is reported for 3D plane-wave filtering.
This paper describes a Global Computing (GC) environment, called Xtrem Web-CH (XWCH). XWCH is an improved version of a GC tool called Xtrem Web (XW). XWCH tries to enrich XW in order to match P2P concepts: distributed...
详细信息
ISBN:
(纸本)0769525091
This paper describes a Global Computing (GC) environment, called Xtrem Web-CH (XWCH). XWCH is an improved version of a GC tool called Xtrem Web (XW). XWCH tries to enrich XW in order to match P2P concepts: distributed scheduling, distributed communication, development of symmetrical models. Two versions of XWCH were developed The first, called XWCH-sMs, manages inter-task communications in a centralized way. The second version, called XWCH-p2p, allows a direct communication between "workers". XWCH is evaluated in the case of a real high performance genetic application.
Previously, DAG scheduling schemes used the mean (average) of computation or communication time in dealing with temporal heterogeneity. However, it is not optimal to consider only the means of computation and communic...
详细信息
ISBN:
(纸本)0769523129
Previously, DAG scheduling schemes used the mean (average) of computation or communication time in dealing with temporal heterogeneity. However, it is not optimal to consider only the means of computation and communication times in DAG scheduling on a temporally (and spatially) heterogeneous distributed computing system. In this paper, it is proposed that the second order moments of computation and communication times, such as the standard deviations, be taken into account in addition to their means, in scheduling "stochastic" DAGs. An effective scheduling approach which accurately estimates the earliest start time of each node and derives a schedule leading to a shorter average parallel execution time has been developed. Through an extensive computer simulation, it has been shown that a significant improvement (reduction) in the average parallel execution times of stochastic DAGs can be achieved by the proposed approach.
State-space based techniques represent a powerful analysis tool of discrete-event systems. One way to face the state-space explosion is the exploitation of behavioral symmetries of distributedsystems. Well-Formed Col...
详细信息
ISBN:
(纸本)0769523129
State-space based techniques represent a powerful analysis tool of discrete-event systems. One way to face the state-space explosion is the exploitation of behavioral symmetries of distributedsystems. Well-Formed Coloured Petri Nets (WN) allow the direct construction of a symbolic reachability graph (SRG) that captures symmetries suitably encoded in WN syntax. Most real systems however mix symmetric and asymmetric behaviors. The SRG, and more generally, all those approaches based on a static description of symmetries, have shown not to be effective in such cases. In this paper two quotient graphs are proposed as effective analysis frameworks for asymmetric systems. Both rely on WN syntax extended with relational operators. The first one is an extension of the SRG that exploits local symmetries. The second technique uses linear constraints and substate inclusion in order to aggregate states. An asymmetric distributed leader-election algorithm is used as running example.
暂无评论