Random walk on graphs has recently gained immense popularity as a tool for graph data analytics and machine learning. Currently, random walk algorithms are developed as individual implementations and suffer significan...
详细信息
ISBN:
(纸本)9781450368735
Random walk on graphs has recently gained immense popularity as a tool for graph data analytics and machine learning. Currently, random walk algorithms are developed as individual implementations and suffer significant performance and scalability problems, especially with the dynamic nature of sophisticated walk strategies. We present KnightKing, the first general-purpose, distributed graph random walk engine. To address the unique interaction between a static graph and many dynamic walkers, it adopts an intuitive walker-centric computation model. The corresponding programming model allows users to easily specify existing or new random walk algorithms, facilitated by a new unified edge transition probability definition that applies across popular known algorithms. With KnightKing, these diverse algorithms benefit from its common distributed random walk execution engine, centered around an innovative rejection-based sampling mechanism that dramatically reduces the cost of higher-order random walk algorithms. Our evaluation confirms that KnightKing brings up to 4 orders of magnitude improvement in executing algorithms that currently can only be afforded with approximation solutions on large graphs.
Crash-recovery bugs (bugs in crash-recovery-related mechanisms) are among the most severe bugs in cloud systems and can easily cause system failures. It is notoriously difficult to detect crash-recovery bugs since the...
详细信息
ISBN:
(纸本)9781450368735
Crash-recovery bugs (bugs in crash-recovery-related mechanisms) are among the most severe bugs in cloud systems and can easily cause system failures. It is notoriously difficult to detect crash-recovery bugs since these bugs can only be exposed when nodes crash under special timing conditions. This paper presents CrashTuner, a novel fault-injection testing approach to combat crash-recovery bugs. The novelty of CrashTuner lies in how we identify fault-injection points (crash points) that are likely to expose errors. We observe that if a node crashes while accessing meta-info variables, i.e., variables referencing high-level system state information (e.g., an instance of node or task), it often triggers crash-recovery bugs. Hence, we identify crash points by automatically inferring meta-info variables via a log-based static program analysis. Our approach is automatic and no manual specification is required. We have applied CrashTuner to five representative distributed systems: Hadoop2/Yarn, HBase, HDFS, ZooKeeper, and Cassandra. CrashTuner can finish testing each system in 17.39 hours, and reports 21 new bugs that have never been found before. All new bugs are confirmed by the original developers and 16 of them have already been fixed (14 with our patches). These new bugs can cause severe damages such as cluster down or start-up failures.
In my presentation I will propose a new link-layer model for distributedcomputing based on so-called relays that promises to be useful for the design of robust and secure distributed systems based on overlay networks...
详细信息
The proceedings contain 7 papers. The topics discussed include: Relays: towards a link layer for robust and secure fog computing;distributing computations in fog architectures;scheduling at the edge for assisting clou...
ISBN:
(纸本)9781450357760
The proceedings contain 7 papers. The topics discussed include: Relays: towards a link layer for robust and secure fog computing;distributing computations in fog architectures;scheduling at the edge for assisting cloud real-time systems;enabling exclusive shared access to cloud of things resources;digital epidemiology and beyond;a novel NFV schedule optimization approach with sensitivity to packets dropping positions;and GoEdge: a scalable and stateless local breakout method.
Over the years, developments such as cloud computing, Internet of Things, and now edge and fog computing, have probably caused paradigm fatigue among practitioners. The question arises whether adopting a specific para...
详细信息
The proceedings contain 9 papers. The topics discussed include: saying what you mean;towards reproducible evaluation of large-scale distributed systems;turn of the carousel – what does edge computing change for distr...
ISBN:
(纸本)9781450357753
The proceedings contain 9 papers. The topics discussed include: saying what you mean;towards reproducible evaluation of large-scale distributed systems;turn of the carousel – what does edge computing change for distributed applications?: research statement;towards a more reliable store-and-forward protocol for mobile text messages;logical clocks are not fair: what is fair? a case study of high-level language and optimization;an analysis of quorum-based abstractions: a case study using gorums to implement raft;data distribution method for fast giga-scale hologram generation on a multi-GPU cluster;language semantics driven design and formal analysis for distributed cyber-physical systems;and Partisan: enabling real-world protocol evaluation.
We present the design and implementation of Partisan, an Erlang library for enabling real-world experiments of distributed protocols and applications. Partisan is a "batteries-included" library facilitating ...
详细信息
The proceedings contain 62 papers. The topics discussed include: nesting-safe recoverable linearizability: modular constructions for non-volatile memory;deterministic abortable mutual exclusion with sublogarithmic ada...
ISBN:
(纸本)9781450357951
The proceedings contain 62 papers. The topics discussed include: nesting-safe recoverable linearizability: modular constructions for non-volatile memory;deterministic abortable mutual exclusion with sublogarithmic adaptive RMR complexity;brief announcement: persistent multi-word compare-and-swap;separating lock-freedom from wait-freedom;passing messages while sharing memory;revisionist simulations: a new approach to proving space lower bounds;on the classification of deterministic objects via set agreement power;leveraging indirect signaling for topology inference and fast broadcast;broadcast in radio networks, time vs. energy tradeoffs;round- and message-optimal distributed graph algorithms;improved massively parallel computation algorithms for mismatching, and vertex cover;distributed approximation of minimum k-edge-connected spanning subgraphs;brief announcement: distributed minimum vertex coloring and maximum independent set in chordal graphs;fair leader election for rational agents in asynchronous rings and networks;leader election in well-connected graphs;almost-surely terminating asynchronous byzantine agreement revisited;sublinear message bounds for randomized agreement;tight bounds for asymptotic and approximate consensus;property testing of planarity in the congest model;and locking timestamps versus locking objects.
The area of machine learning has made considerable progress over the past decade, enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Given the large computational d...
详细信息
ISBN:
(纸本)9781450357951
The area of machine learning has made considerable progress over the past decade, enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Given the large computational demands of machine learning workloads, parallelism, implemented either through single-node concurrency or through multi-node distribution, has been a third key ingredient to advances in machine learning. The goal of this tutorial is to provide the audience with an overview of standard distribution techniques in machine learning, with an eye towards the intriguing trade-offs between synchronization and communication costs of distributed machine learning algorithms, on the one hand, and their convergence, on the other. The tutorial will focus on parallelization strategies for the fundamental stochastic gradient descent (SGD) algorithm, which is a key tool when training machine learning models, from classical instances such as linear regression, to state-of-the-art neural network architectures. The tutorial will describe the guarantees provided by this algorithm in the sequential case, and then move on to cover both shared-memory and message-passing parallelization strategies, together with the guarantees they provide, and corresponding trade-offs. The presentation will conclude with a broad overview of ongoing research in distributed and concurrent machine learning. The tutorial will assume no prior knowledge beyond familiarity with basic concepts in algebra and analysis.
The asynchronous computability theorem (ACT) uses concepts from combinatorial topology to characterize which tasks have wait-free solutions in read–write memory. A task can be expressed as a relation between two chro...
详细信息
暂无评论