In [11] a method for the analysis of the expected time complexity of a randomized distributed algorithm is presented. The method consists of proving auxiliary probabilistic time bound statements of the form U t/→p U...
详细信息
ISBN:
(纸本)9780897917100
In [11] a method for the analysis of the expected time complexity of a randomized distributed algorithm is presented. The method consists of proving auxiliary probabilistic time bound statements of the form U t/→p U′, which mean that whenever the algorithm begins in a state in set U, it will reach a state in set U′ within time t with probability at least p. However, [11] does not provide a formal methodology to prove the validity of a specific probabilistic time bound statement from scratch: each statement is proved by means of ad hoc operational arguments. Unfortunately, operational reasoning is generally error-prone and difficult to check. In this paper we overcome the problem by developing a new technique to prove probabilistic time bound statements, which consists of reducing the analysis of a time bound statement to the analysis of a statement that does not involve probability. As a consequence, several existing techniques for a non-randomized algorithms can be applied, and correctness proofs can be verified mechanically.
A quorum system is a collection of sets (quorums) every two of which intersect. Quorum systems have been used for many applications in the area of distributed systems, including mutual exclusion, data replication and ...
详细信息
A quorum system is a collection of sets (quorums) every two of which intersect. Quorum systems have been used for many applications in the area of distributed systems, including mutual exclusion, data replication and dissemination of information. In this paper we introduce a general class of quorum systems called Crumbling Walls and study its properties. The elements (processors) of a wall are logically arranged in rows of varying widths. A quorum in a wall is the union of one full row and a representative from every row below the full row. This class considerably generalizes a number of known quorum system constructions. The best crumbling wall is the CWlog quorum system. It has small quorums, of size O(lg n), and structural simplicity. The CWlog has optimal availability and optimal load among systems with such small quorum size. It manifests its high quality for all universe sizes, so it is a good choice not only for systems with thousands or millions of processors but also for systems with as few as 3 or 5 processors. Moreover, our analysis shows that the availability will increase and the load will decrease at the optimal rates as the system increases in size.
This paper presents a fast distributed algorithm to compute a small k-dominating set D (for any fixed k) and its induced graph partition (breaking the graph into radius k clusters centered around the vertices of D). T...
详细信息
ISBN:
(纸本)9780897917100
This paper presents a fast distributed algorithm to compute a small k-dominating set D (for any fixed k) and its induced graph partition (breaking the graph into radius k clusters centered around the vertices of D). The time complexity of the algorithm is O(k log* n). Small k-dominating sets have applications in a number of areas, including routing with sparse routing tables via the scheme of [PU], the design of distributed data structures [P2], and center selection in a distributed network (cf. [BKP]). The main application described in this paper concerns a fast distributed algorithm for constructing a minimum weight spanning tree (MST). On an n-vertex network of diameter d, the new algorithm constructs an MST in time O(√n log* n + d), improving on the results of [GKP]. The new MST algorithm is conceptually simpler than the three-phase algorithm of [GKP]. In addition to exploiting small k-dominating sets, it uses a very simple convergecast protocol to inform a center about graph edges, that avoids forwarding messages about edges that close cycles. This convergecast protocol is similar to the one used in the third phase of the algorithm of [GKP], and most of the novelty lies in a new careful analysis proving that the convergecast process is fully pipelined, and no waiting occurs at intermediate nodes. This enables the new algorithm to skip the complicated second phase of the algorithm of [GKP].
Protocols to implement a fault-tolerant computing system are described. These protocols augment the hypervisor of a virtual-machine manager and coordinate a primary virtual machine with its backup. The result is a fau...
详细信息
The proceedings contain 30 papers. The topics discussed include: hypervisor-based fault tolerance;hive: fault containment for shared-memory multiprocessors;U-Net: a user-level network interface for parallel and distri...
ISBN:
(纸本)0897917154
The proceedings contain 30 papers. The topics discussed include: hypervisor-based fault tolerance;hive: fault containment for shared-memory multiprocessors;U-Net: a user-level network interface for parallel and distributedcomputing;object and native code thread mobility among heterogeneous computers (includes sources);the HP AutoRAID hierarchical storage system;performance of cache coherence in stackable filing;exploiting weak connectivity for mobile file access;rover: a toolkit for mobile information access;managing update conflicts in bayou, a weakly connected replicated storage system;a new page table for 64-bit address spaces;CRL: high-performance all-software distributed shared memory;the performance of the container shipping I/O system;time-function scheduling: a general approach to controllable resource;a real-time upcall facility for protocol processing with QoS guarantees;and using a modified object buffer to improve the write performance of an object-oriented database.
Data-parallel languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. Novel features of these languages call for the develop...
详细信息
Data-parallel languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. Novel features of these languages call for the development of new techniques in both compilers and run-time systems. In this paper, we present an improved algorithm for finding the local memory access sequence in computations involving regular sections of arrays with cyclic(k) distributions. After establishing the fact that regular section indices correspond to elements of an integer lattice, we show how to find a lattice basis that allows for simple and fast enumeration of memory accesses. The complexity of our algorithm is shown to be lower than that of the previous solution for the same problem. In addition, the experimental results demonstrate the efficiency of our method in practice.
As communication networks grow, existing fault handling tools that involve global measures such as global time-outs or reset procedures become increasingly unaffordable, since their cost grows with the size of the net...
详细信息
As communication networks grow, existing fault handling tools that involve global measures such as global time-outs or reset procedures become increasingly unaffordable, since their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, its cost must depend only on the number of failed nodes (which, thanks to today's technology, grows much slower than the networks). Moreover, it should allow the non-faulty regions of the networks to continue their operation even during the recovery of the faulty parts. This abstract introduces the concepts fault locality, and of fault-locally mendable problems, which are problems for which there exist correction algorithms (applied after faults) whose cost depends only on the (unknown) number of faults. We show that any problem is fault locally mendable. The solution involves a novel technique combining data structures and 'local votes' among nodes, that may be of interest in itself.
The U-Net communication architecture provides processes with a virtual view of a network interface to enable user-level access to high-speed communication devices. The architecture, implemented on standard workstation...
详细信息
暂无评论