In this paper, a fast-convergence distributed support vector machine (FDSVM) algorithm is proposed, aiming at efficiently solving the problem of distributed SVM training. Rather than exchanging information only amon...
详细信息
In this paper, a fast-convergence distributed support vector machine (FDSVM) algorithm is proposed, aiming at efficiently solving the problem of distributed SVM training. Rather than exchanging information only among immediate neighbor sites, the proposed FDSVM employs a deterministic gossip protocol-based commu nication policy to accelerate diffusing information around the network, in which each site communicates with others in a flooding and iterative manner. This communication policy significantly reduces the total number of iterations, thus further speeding up the convergence of the algorithm. In addition, the proposed algorithm is proved to converge to the global optimum in finite steps over an arbitrary strongly connected network (SCN). Experiments on various benchmark data sets show that the proposed FDSVM consistently outperforms the related state-of-the art approach for most networks, especially in the ring network, in terms of the total training time.
We present a new distributed programming extension of the R programming language. By tightly coupling R to the well-known ScaLAPACK and MPI libraries, we are able to achieve highly scalable implementations of common s...
详细信息
ISBN:
(纸本)9780769549569;9781467362184
We present a new distributed programming extension of the R programming language. By tightly coupling R to the well-known ScaLAPACK and MPI libraries, we are able to achieve highly scalable implementations of common statistical methods, allowing the user to analyze bigger datasets with R than ever before. Early benchmarks show great optimism for the project and its future.
Process mining techniques have matured over the last decade and more and more organization started to use this new technology. The two most important types of process mining are process discovery (i.e., learning a pro...
详细信息
ISBN:
(纸本)9783642288722
Process mining techniques have matured over the last decade and more and more organization started to use this new technology. The two most important types of process mining are process discovery (i.e., learning a process model from example behavior recorded in an event log) and conformance checking (i.e., comparing modeled behavior with observed behavior). Process mining is motivated by the availability of event data. However, as event logs become larger (say terabytes), performance becomes a concern. The only way to handle larger applications while ensuring acceptable response times, is to distribute analysis over a network of computers (e.g., multicore systems, grids, and clouds). This paper provides an overview of the different ways in which process mining problems can be distributed. We identify three types of distribution: replication, a horizontal partitioning of the event log, and a vertical partitioning of the event log. These types are discussed in the context of both procedural (e.g., Petri nets) and declarative process models. Most challenging is the horizontal partitioning of event logs in the context of procedural models. Therefore, a new approach to decompose Petri nets and associated event logs is presented. This approach illustrates that process mining problems can be distributed in various ways.
The paper proposes a distributed computing framework that integrates parallel differential evolution (DE) and multi-agents. Given a complex high-dimensional optimization problem, our approach decomposes the problem in...
详细信息
ISBN:
(纸本)9781467318556;9781467318570
The paper proposes a distributed computing framework that integrates parallel differential evolution (DE) and multi-agents. Given a complex high-dimensional optimization problem, our approach decomposes the problem into a set of sub-components, which are evolved by a set of Slave agents concurrently, and the results are synthesized and further evolved by a Master agent. As top-level agents of the framework, the Master and Slave agents can be divided into asynchronous teams of sub-agents including Constructors for solution initialization, Improvers for solution evolution, Repairers for constraint handling, Destroyers for keeping the quality and size of the population, etc., which share populations of solution vectors and cooperate to solve the problem efficiently. The proposed approach is highly parallelized, flexible, and scalable, and its efficiency is demonstrated by comparison with some state-of-the-art approaches.
In distributed environment, message logging based checkpointing and rollback recovery is a commonly used approach for providing distributed systems with fault tolerance and synchronized global states. Clearly, taking ...
详细信息
ISBN:
(纸本)9781424443598
In distributed environment, message logging based checkpointing and rollback recovery is a commonly used approach for providing distributed systems with fault tolerance and synchronized global states. Clearly, taking more frequent checkpointing reduces system recovery time in the presence of faults, and hence improves the system availability; however, more frequent checkpointing may also increase the probability for a task to miss its deadlines or prolong its execution time in fault free scenarios. Hence, in distributed and real-time computing, the systempsilas overall quality must be measured by a set of aggregated criteria, such as availability, task execution time, and task deadline miss probability. In this paper, we take into account state synchronization costs in the checkpointing and rollback recovery scheme and quantitatively analyze the relationships between checkpoint intervals and these criteria. Based on the analytical results, we present an algorithm for finding an optimal checkpoint interval that maximizes systempsilas overall quality.
Recently, distributed computing system have been gaining much attention due to a growing demand for various kinds of effective computations in both industry and academia. In this paper, we focus on Peer-to-Peer (P2P) ...
详细信息
Recently, distributed computing system have been gaining much attention due to a growing demand for various kinds of effective computations in both industry and academia. In this paper, we focus on Peer-to-Peer (P2P) computing systems, also called public-resource computing systems or global computing systems. P2P computing systems, contrary to grids, use personal computers and other relatively simple electronic equipment (e. g., the PlayStation console) to process sophisticated computational projects. A significant example of the P2P computing idea is the BOINC (Berkeley Open Infrastructure for Network computing) project. To improve the performance of the computing system, we propose to use the P2P approach to distribute results of computational projects, i.e., results are transmitted in the system like in P2P file sharing systems (e.g., BitTorrent). In this work, we concentrate on offline optimization of the P2P computing system including two elements: scheduling of computations and data distribution. The objective is to minimize the system OPEX cost related to data processing and data transmission. We formulate an Integer Linear Problem (ILP) to model the system and apply this formulation to obtain optimal results using the CPLEX solver. Next, we propose two heuristic algorithms that provide results very close to an optimum and can be used for larger problem instances than those solvable by CPLEX or other ILP solvers.
In self-stabilization, each node has a local view of the distributed network system, in a finite amount of time the system converges to a global setup with desired property, in this case establishing a 2-packing set. ...
详细信息
In self-stabilization, each node has a local view of the distributed network system, in a finite amount of time the system converges to a global setup with desired property, in this case establishing a 2-packing set. Using a graph G = (V. E) to represent the network, a subset S subset of V is a 2-packing if for all i is an element of V: vertical bar N vertical bar i vertical bar boolean AND S vertical bar <= 1. In this paper, we first propose an ID-based. constant space, self-stabilizing algorithm that stabilizes to a maximal 2-packing in an arbitrary graph. We show that the algorithm stabilizes in O(mm) moves under any scheduler (such as a distributed daemon). Secondly, we show that the algorithm stabilizes in O(n(2)) rounds under a synchronous daemon where every privileged node moves at each round. Published by Elsevier B.V.
Peer-to-peer paradigm is more and more studied by the distributed computing community. Indeed, this type of architecture has interesting properties like the absence of centralized topology, fault tolerance or dynamic ...
详细信息
Peer-to-peer paradigm is more and more studied by the distributed computing community. Indeed, this type of architecture has interesting properties like the absence of centralized topology, fault tolerance or dynamic reorganization of the network. However, managing these networks is complex and the acceleration of the distributed applications is not ensured. That is why it is necessary to predict the performance as soon as possible in design and development phases, to bypass bottlenecks and to correct part of the applications that slow down the execution time. In this context, we propose P2PPerf. a simulation tool that aims at predicting performance and the execution time of a distributed application before its finalization. P2PPerf has been tested on JNGI: a P2P distributed computing application using the JXTA platform. Copyright (c) 2007 John Wiley & Sons, Ltd.
Trustworthiness of resources is the foundation of virtual computing environments (VCE). Using identical information of resource is an ordinate method to achieve trustworthiness. However, it may conflict with identity ...
详细信息
ISBN:
(纸本)9781424442232
Trustworthiness of resources is the foundation of virtual computing environments (VCE). Using identical information of resource is an ordinate method to achieve trustworthiness. However, it may conflict with identity privacy problem, an important security issue in VCE, that tries to make resource anonymous. A trust model is proposed to enhance resourcespsila identity privacy by changing pseudonym in fully distributed VCE, while each resourcepsilas trustworthiness can be evaluated by upper applications. Simulation results show that the model achieves considerably fine performance with measurements of resource trustworthiness evaluation error, resource selection success rate, and message overhead.
Resource performance monitoring is among the most active research topics in distributed computing. In this paper, we propose an adaptive resource monitoring method for applications in heterogeneous computing environme...
详细信息
Resource performance monitoring is among the most active research topics in distributed computing. In this paper, we propose an adaptive resource monitoring method for applications in heterogeneous computing environment. According to the operating environment of distributed heterogeneous system and the changes of system resource workload, the method combines periodic pull mode with event-driven push mode to adaptively publish and retrieve system resource information. Preliminary experiments reveal that, by using our adaptive monitoring method, the efficiency of system monitoring is improved over that accrued by using regular monitoring approaches.
暂无评论