Process failure rate in the next generation of high performance computing systems is expected to be very high. MPI Forum is working on providing semantics and support for fault tolerance. Run-through Stabilization, Us...
详细信息
ISBN:
(纸本)9783319232379;9783319232362
Process failure rate in the next generation of high performance computing systems is expected to be very high. MPI Forum is working on providing semantics and support for fault tolerance. Run-through Stabilization, User-Level Failure Mitigation and Process Recovery proposals are the resulting endeavors. Run-through Stabilization/User Level Failure Mitigation proposals require a fault tolerant failure detection and consensus algorithm to inform the application of failures so that it can employ Algorithm Based Fault Tolerance for quicker recovery and continued execution. this paper discusses the proposals in short, the failure detectors available in the literature and their unsuitability for realizing fault tolerance in MPI. It then outlines an inherently fault-tolerant and scalable Epidemic (or Gossip-based) approach for failure detection and consensus. Some simulations and an initial experimental analysis are presented, which indicate that this is a promising research direction.
this paper presents a holistic approach to execute tasks in distributed smart systems. this is shown by the example of monitoring tasks in smart camera networks. the proposed approach is general and thus not limited t...
详细信息
ISBN:
(纸本)9783319232379;9783319232362
this paper presents a holistic approach to execute tasks in distributed smart systems. this is shown by the example of monitoring tasks in smart camera networks. the proposed approach is general and thus not limited to a specific scenario. A job-resource model is introduced to describe the smart system and the tasks, with as much order as necessary and as few rules as possible. Based on that model, a local algorithm is presented, which is developed to achieve optimization transparency. this means that the optimization on system-wide criteria will not be visible to the participants. To a task, the system-wide optimization is a virtual local single-step optimization. the algorithm is based on proactive quotation broadcasting to the local neighborhood. Additionally, it allows the parallel execution of tasks on resources and includes the optimization of multiple-task-to-resource assignments.
An IDA-PBC-like control synthesis for infinite dimensional port Hamiltonian systems is investigated. As for the finite dimensional case, a feedback control transforms the original model into a closed loop target Hamil...
详细信息
An IDA-PBC-like control synthesis for infinite dimensional port Hamiltonian systems is investigated. As for the finite dimensional case, a feedback control transforms the original model into a closed loop target Hamiltonian model. Bothdistributed control and boundary control are used. the finite rank distributed control is determined to solve an average IDA-PBC matching equation. A backstepping boundary control is used to stabilize the matching error. the control model chosen to illustrate the approach is the so-called resistive diffusion equation for the radial diffusion of the poloidal magnetic flux. (C) 2015, IFAC (international Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
distributed systems primarily provide the access to data intensive computation through a wide range of interfaces. Due to the advances of the systems, the scales and complexity of the system have increased, causing fa...
详细信息
ISBN:
(纸本)9783319232379;9783319232362
distributed systems primarily provide the access to data intensive computation through a wide range of interfaces. Due to the advances of the systems, the scales and complexity of the system have increased, causing faults are likely bound to happen leading into diverse faults and failure conditions. therefore, fault tolerance has become a crucial property for distributed system in order to preserve its function correctly and available in the presence of faults. Replication techniques particularly concentrates on two fault tolerance manners;masking the failures on the fly as well as reconfiguring the systems in response. this paper presents a brief reviews on different replication techniques, such as Grid Configuration (GC), Box-Shaped Grid (BSG) and Neighbor Replication on Grid (NRG) by comparing and formalizing its communication costs and availabilities analysis based on k-out-of-n model. Each of these techniques presents their own merits and demerits which form the subject matter of this review.
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. Processing graphs with hundreds of billions o...
详细信息
ISBN:
(纸本)9781450333177
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. Processing graphs with hundreds of billions of edges is only possible via developing distributed algorithms under distributed graph mining frameworks such as MapReduce, Pregel, Gigraph, and alike. For these distributed algorithms to work well in practice, we need to take into account several metrics such as the number of rounds of computation and the communication complexity of each round. For example, given the popularity and ease-of-use of MapReduce framework, developing practical algorithms with good theoretical guarantees for basic graph algorithms is a problem of great importance. In this tutorial, we first discuss how to design and implement algorithms based on traditional MapReduce architecture. In this regard, we discuss various basic graph theoretic problems such as computing connected components, maximum matching, MST, counting triangle and overlapping or balanced clustering. We discuss a computation model for MapReduce and describe the sampling, filtering, local random walk, and core-set techniques to develop efficient algorithms in this framework. At the end, we explore the possibility of employing other distributed graph processing frameworks. In particular, we study the effect of augmenting MapReduce with a distributed hash table (DHT) service and also discuss the use of a new graph processing framework called ASYMP based on asynchronous message-passing method. In particular, we will show that using ASyMP, one can improve the CPU usage, and achieve significantly improved running time.
this paper presents a recommendation algorithm based on matrix operations (RAMO), which integrates collaborative filtering algorithm with information network-based approach. RAMO exploits information from different ob...
详细信息
ISBN:
(纸本)9783319251592;9783319251585
this paper presents a recommendation algorithm based on matrix operations (RAMO), which integrates collaborative filtering algorithm with information network-based approach. RAMO exploits information from different objects to increase the recommendation accuracy. Furthermore, a distributed recommendation algorithm DRAMD is proposed based on matrix decomposition using the framework MapReduce. DRAMD can be run across multiple cluster nodes to reduce the computation time. Test results on MovieLens dataset show that the algorithms not only have better recommendation effectiveness but improve the efficiency of the computation.
this article describes the approach to building data mining cloud service based on actor model. the article describes the mapping of the algorithm decomposed into functional blocks on the set of actors. Also it descri...
详细信息
ISBN:
(纸本)9783319231266;9783319231259
this article describes the approach to building data mining cloud service based on actor model. the article describes the mapping of the algorithm decomposed into functional blocks on the set of actors. Also it describes the architecture and implementation of cloud service to perform data mining algorithms for actors. As an example, it describes the implementation and experiments with neural network learning algorithm on the cluster actors.
In this paper we consider distributed-parameter systems that allow for a Lagrangian or port-Hamiltonian formulation. We will distinguish the case where the Lagrangian or the Hamiltonian depend on derivative variables ...
详细信息
In this paper we consider distributed-parameter systems that allow for a Lagrangian or port-Hamiltonian formulation. We will distinguish the case where the Lagrangian or the Hamiltonian depend on derivative variables (jet-variables) of first-order and the case where second-order derivatives appear. this distinction will be important for the correct determination of the boundary conditions in the Lagrangian scenario and for the investigation of possible boundary ports in the Hamiltonian picture. the derivation of the partial differential equations and the boundary terms/ports will be accomplished in a geometric fashion by using the so-called Cartan-form. We visualize our results by mechanical examples such as beams and plates. (C) 2015, IFAC (international Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Geographically dispersed cloud data centers (DCs) enable web application providers to improve their services' response time and availability by deploying application replicas in multiple DCs. To allow applications...
详细信息
ISBN:
(纸本)9781467372879
Geographically dispersed cloud data centers (DCs) enable web application providers to improve their services' response time and availability by deploying application replicas in multiple DCs. To allow applications requiring strong consistency to be deployed in multiple clouds, industry and academia have developed various scalable database systems that can guarantee strong inter-DC consistency with alleviated network overhead. For applications using these database systems, it is essential to take boththe network latencies to the end users and the communication overhead of the databases into account when selecting the hosting DCs. In this paper, we study how to identify the satisfactory deployment plan (hosting DCs and request routing) considering SLO satisfaction, migration cost, and operational cost for applications using these databases. the proposed approach involves two steps. First, it searches the deployment plan with minimum amount of SLO violations using genetic algorithm when the application is first migrated to the clouds. then it continuously optimizes the deployment in a certain time interval according to the changing workload and the current deployment plan. We illustrate how our approach works for the applications using two databases (Cassandra and Galera Cluster), and demonstrate the effectiveness of our approach through simulation studies using settings of two example applications (TPC-W and Twissandra). Our solution is extensible to applications using other database systems that have similar properties.
暂无评论