In consensus, the n nodes of a distributed system seek to take a consistent decision on some output, despite up to t of them crashing or even failing maliciously, i.e., behaving "Byzantine". It is known that...
详细信息
ISBN:
(纸本)9781450320658
In consensus, the n nodes of a distributed system seek to take a consistent decision on some output, despite up to t of them crashing or even failing maliciously, i.e., behaving "Byzantine". It is known that it is impossible to guarantee that synchronous, deterministic algorithms consistently decide on an output in fewer than f + 1 rounds in executions in which the actual number of faults is f ≤ t. This even holds if faults are crash-only, and in this case the bound can be matched precisely. However, the question of whether this can be done efficiently, i.e., with little communication, so far has not been addressed. In this work, we show that algorithms tolerating Byzantine faults and deciding within f + 2 rounds must send ω(nt + t2f) messages;as a byproduct, our analysis shows that decision within f +1 rounds is impossible in this setting (unless f = t). Moreover, we prove that any crash-resilient algorithm deciding in f + 1 rounds has worst-case message complexity ω(n2f). Interestingly, this changes drastically if we restrict the fault model further. If crashes are orderly, i.e., in each round, each node picks an order in which its messages are sent, and crashing nodes successfully transmit a prefix of their sequence, deciding in f + 1 rounds can be guaranteed with O(nt) messages. Copyright 2013acm.
The new page swap mechanism is introduced to resolve an inconsistent page problem for multithreaded applications in user-level remote paging systems. According to the evaluations, its overhead is limited and it can be...
详细信息
ISBN:
(纸本)9780769549965;9781467364652
The new page swap mechanism is introduced to resolve an inconsistent page problem for multithreaded applications in user-level remote paging systems. According to the evaluations, its overhead is limited and it can be applicable to actual use for multithreaded applications.
Cloud providers may operate large-scale datacenters in a few locations. We argue that deploying many small-scale datacenters at network edge can significantly improve user experience in terms of latency. Small-scale d...
详细信息
ISBN:
(纸本)9780769549965;9781467364652
Cloud providers may operate large-scale datacenters in a few locations. We argue that deploying many small-scale datacenters at network edge can significantly improve user experience in terms of latency. Small-scale datacenters, however, may not be able to provide elastic services. In this paper, we investigate distributed small-scale datacenters with load reallocation where jobs that cannot be suitably processed locally will be reallocated to remote datacenters. We formulate an optimization problem for load reallocation in distributed datacenters, provide performance comparisons among different alternatives and offer insights on handling multiple job types. We develop online optimization algorithms that can be operated in a decentralized and measurement-based fashion to dynamically reallocate load in response to sudden load surges. The experimental results demonstrate that elasticity can be practically provided by small-scale datacenters enhanced with effective load reallocation techniques.
distributed software transactional memory (DTM) is an emerging, alternative concurrency control model for distributed systems that promises to alleviate the difficulties of lock-based distributed synchronization. Obje...
详细信息
ISBN:
(纸本)9780769549965;9781467364652
distributed software transactional memory (DTM) is an emerging, alternative concurrency control model for distributed systems that promises to alleviate the difficulties of lock-based distributed synchronization. Object replication can improve concurrency and achieve fault-tolerance in DTM, but may incur high communication overhead (in metric-space networks) to ensure one-copy serializability. We consider metric-space networks and develop a cluster-based object replication model for DTM. In this model, object replicas are distributed to clusters of nodes, where clusters are determined based on distance between nodes, to maximize locality and fault-tolerance and to minimize communication overhead. We develop a transactional scheduler for this model, called CTS. CTS enqueues live transactions and identifies some of the transactions that must be aborted in advance to enhance concurrency of the other transactions over clusters, reducing a significant number of future conflicts. Our implementation and experimental evaluation reveals that CTS improves transactional throughput over state-of-the-art replicated DTM solutions by as much as (average) 1.55x and 1.73x under low and high contention, respectively.
Massively Multiplayer Online Games (MMOGs) are an important type of distributed applications and have millions of users. Traditionally, MMOGs are hosted on dedicated clusters, distributed globally. With the advent of ...
详细信息
ISBN:
(纸本)9780769549965;9781467364652
Massively Multiplayer Online Games (MMOGs) are an important type of distributed applications and have millions of users. Traditionally, MMOGs are hosted on dedicated clusters, distributed globally. With the advent of cloud computing, MMOGs such as Zynga's are increasingly run on cloud resources, through the use of cloud technology and innovation. Massivizing MMOGs on clouds is the focus of my PhD research. My main contributions are: 1) analyzing and modeling various MMOG workloads, including those of social and traditional real-time games, 2) designing and implementing a cost-efficient and reliable cloud-based MMOG platform, 3) designing and implementing a scalable MMOG system which employs domain-specific scaling techniques to support the real-time strategy games of the future, 4) experimental prototypes and tools to evaluate our proposed research via real-world experimentation and simulation, and applying our proposed research to a popular real-world application. In this article, I introduce my research progress and my future plans.
distributed systems are easier to build than ever with the emergence of new, data-centric abstractions for storing and computing over massive datasets. However, similar abstractions do not exist for storing and access...
详细信息
ISBN:
(纸本)9781450323888
distributed systems are easier to build than ever with the emergence of new, data-centric abstractions for storing and computing over massive datasets. However, similar abstractions do not exist for storing and accessing meta-data. To fill this gap, Tango provides developers with the abstraction of a replicated, in-memory data structure (such as a map or a tree) backed by a shared log. Tango objects are easy to build and use, replicating state via simple append and read operations on the shared log instead of complex distributed protocols;in the process, they obtain properties such as linearizability, persistence and high availability from the shared log. Tango also leverages the shared log to enable fast transactions across different objects, allowing applications to partition state across machines and scale to the limits of the underlying log without sacrificing consistency.
Given a simple graph G = (V, E) and a set of sources SCV, denote for each node v Ε V by L(∞) the lexicographically ordered list of distance/source pairs (d(s, v), s), where s Ε S. For integers d, k Ε NU{∞}, we co...
详细信息
ISBN:
(纸本)9781450320658
Given a simple graph G = (V, E) and a set of sources SCV, denote for each node v Ε V by L(∞) the lexicographically ordered list of distance/source pairs (d(s, v), s), where s Ε S. For integers d, k Ε NU{∞}, we consider the source detection, or (S, d, k)-detection task, requiring each node v to learn the first k entries of L(∞) (if for all of them d(s, v) ≤ d) or all entries (d(s, v),s) Ε L (∞), satisfying that d(s, v) ≤ d (otherwise). Solutions to this problem provide natural generalizations of concurrent breadth-first search (BFS) tree constructions. For example, the special case of k = ∞ requires each source s Ε S to build a complete BFS tree rooted at s, whereas the special case of d = ∞ and S = V requires constructing a partial BFS tree comprising at least k nodes from every node in V. In this work, we give a simple, near-optimal solution for the source detection task in the CONGEST model, where messages contain at most C(logn) bits, running in d + k rounds. We demonstrate its utility for various routing problems, exact and approximate diameter computation, and spanner construction. For those problems, we obtain algorithms in the CONGEST model that are faster and in some cases much simpler than previous solutions. Copyright 2013acm.
Due to the unavailability of quantum computers, simulation of quantum algorithms using classical computers is still the most affordable option for research of algorithms and models for quantum computing. As such simul...
详细信息
ISBN:
(纸本)9780769549965;9781467364652
Due to the unavailability of quantum computers, simulation of quantum algorithms using classical computers is still the most affordable option for research of algorithms and models for quantum computing. As such simulation requires high computing power and memory resources, and computations are regular and data-intensive, GPUs become a suitable solution for accelerating the simulation of quantum algorithms. This work introduces an extension of the execution library for the VPE-qGM environment to support GPU acceleration. Hadamard gates with up to 20 quantum bits were simulated, and speedups of up to approximately 540x over a sequential simulation and of approximately 85x over a 8-core distributed simulation in the VirD-GM environment were achieved.
The paper reports on work in progress towards construction of a peer-to-peer framework for privacy preserving computing on distributed electronic health data. The framework supports three different types of federated ...
详细信息
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds pos...
详细信息
ISBN:
(纸本)9781450323888
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedulemillions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.
暂无评论