The aim of this paper is to present a new paradigm for reactive and real-time distributed programming: weak synchronism. We define a small language for communicating reactive automata, and characterize it by an operat...
详细信息
ISBN:
(纸本)0444817913
The aim of this paper is to present a new paradigm for reactive and real-time distributed programming: weak synchronism. We define a small language for communicating reactive automata, and characterize it by an operational semantics. We show that weak synchronism provides a deterministic semantics of concurrency and allows physical distributed implementations. This weak synchronous paradigm can then be extended to real-time programming, by defining a more general paradigm, a strong-weak synchronous coupling.
Distributed programming has to face problems due to asynchronism of underlying communication networks. If for some applications the only use of FIFO channels eliminates the undesired effects due to asynchronism, this ...
详细信息
ISBN:
(纸本)0444818707
Distributed programming has to face problems due to asynchronism of underlying communication networks. If for some applications the only use of FIFO channels eliminates the undesired effects due to asynchronism, this is generally not sufficient. Total or causal order of deliveries of messages have been proposed to overcome such problems but in some cases these orders impose a too strong property that can reduce the potential parallelism of the application. This paper proposes a flexible broadcast primitive that attaches a type (ordinary or causal) to each message;these types impose constraints on messages deliveries. In that way the programmer is able to exploit the potential parallelism of his application in order to get an efficient program. All causal messages obey causal delivery whereas ordinary messages deliveries are only constrained by their position (sending time) with respect to causal messages. A protocol implementing these communication primitives is given and proved. Extensions are suggested.
As cloud-based computation grows to be an increasingly important paradigm, providing a general computational interface to support datacenter-scale programming has become an imperative research agenda. Many cloud syste...
详细信息
As cloud-based computation grows to be an increasingly important paradigm, providing a general computational interface to support datacenter-scale programming has become an imperative research agenda. Many cloud systems use existing virtual machine monitor (VMM) technologies, such as Xen, VMware, and Windows Hypervisor, to multiplex a physical host into multiple virtual hosts and isolate computation on the shared cluster platform. However, traditional multiplexing VMMs do not scale beyond one single physical host, and it alone cannot provide the programming interface and cluster-wide computation that a datacenter system requires. We design a new instruction set architecture, DISA, to unify myriads of compute nodes to form a big virtual machine called DVM and present programmers the view of a single computer, where thousands of tasks run concurrently in a large, unified, and snapshotted memory space. The DVM provides a simple yet scalable programming model and mitigates the scalability bottleneck of traditional distributed shared memory systems. Along with an efficient execution engine, the capacity of a DVM can scale up to support large clusters. We have implemented and tested DVM on four platforms, and our evaluation shows that DVM has excellent performance and scalability. On one physical host, the system overhead of DVM is comparable to that of traditional VMMs. On 16 physical hosts, the DVM runs 10 times faster than MapReduce/Hadoop and X10. On 160 compute nodes in the TH-1/GZ supercomputer, the DVM delivers a 12.99x speedup over the computation on 10 compute nodes. The implementation of DVM also allows it to run above traditional VMMs, and we verify that DVM shows linear speedup on a parallelizable workload on 256 large EC2 instances.
A rigorous modular specification method requires a proof rule asserting that if each component behaves correctly in isolation, then it behave correctly in concert with other components. Such a rule is subtle because a...
详细信息
A rigorous modular specification method requires a proof rule asserting that if each component behaves correctly in isolation, then it behave correctly in concert with other components. Such a rule is subtle because a component need behave correctly only when its environment does, and each component is part of the others' environments. We examine the precise distinction between a system and its environment, and provide the requisite proof rule when module 8 are specified with safety and liveness properties.
Developers use the open source Erlang programming language in domains such as telecommunications, database systems, and the Web due to its superior support for concurrency and reliability. Erlang applications comprise...
详细信息
Developers use the open source Erlang programming language in domains such as telecommunications, database systems, and the Web due to its superior support for concurrency and reliability. Erlang applications comprise numerous processes-lightweight user-space threads-that communicate via message passing. This article focuses on Erlang's concurrency support and details an example 1D Poisson solver program.
Lock-free objects offer significant performance and reliability advantages over conventional lock-based objects. However, the lack of an efficient portable lock-free method for the reclamation of the memory occupied b...
详细信息
Lock-free objects offer significant performance and reliability advantages over conventional lock-based objects. However, the lack of an efficient portable lock-free method for the reclamation of the memory occupied by dynamic nodes removed from such objects is a major obstacle to their wide use in practice. This paper presents hazard pointers, a memory management methodology that allows memory reclamation for arbitrary reuse. It is very efficient, as demonstrated by our experimental results. It is suitable for user-level applications-as well as system programs-without dependence on special kernel or scheduler support. It is wait-free. It requires only single-word reads and writes for memory access in its core operations. It allows reclaimed memory to be returned to the operating system. In addition, it offers a lock-free solution for the ABA problem using only practical single-word instructions. Our experimental results on a multiprocessor system show that the new methodology offers equal and, more often, significantly better performance than other memory management methods, in addition to its qualitative advantages regarding memory reclamation and independence of special hardware support. We also show that lock-free implementations of important object types, using hazard pointers, offer comparable performance to that of efficient lock-based implementations under no contention and no multiprogramming, and outperform them by significant margins under moderate multiprogramming and/or contention, in addition to guaranteeing continuous progress and availability, even in the presence of thread failures and arbitrary delays.
In this paper, we propose a new parallel genome matching algorithm using graphics processing units (GPUs). Our proposed approach is based on the Aho-Corasick algorithm and it was developed based on a consideration of ...
详细信息
In this paper, we propose a new parallel genome matching algorithm using graphics processing units (GPUs). Our proposed approach is based on the Aho-Corasick algorithm and it was developed based on a consideration of the architectural features of existing GPUs with a hundred or more cores. Thus, we provide an appropriate task partitioning method that runs on multiple threads and we fully utilize the cache memory and the shared memory structures available in GPUs. Especially, we propose a tiled access method for rapid data transfer from the global memory to the shared memory. We also provide new models for cache-friendly state transition table to improve performance of pattern matching operations on GPUs. The maximum throughput we achieved in various experiments was 15.3 Gbps. Moreover, we showed that our proposed design outperformed an earlier approach with a 15.4 % performance improvement.
In this paper, we investigate parallel implementation techniques for network coding. It is known that network coding is useful for both wired and wireless networks and it also mitigates peer/piece selection problems i...
详细信息
In this paper, we investigate parallel implementation techniques for network coding. It is known that network coding is useful for both wired and wireless networks and it also mitigates peer/piece selection problems in P2P file sharing systems. However, due to the decoding complexity of network coding, there have been concerns about adoption of network coding in practical network systems and to improve the decoding performance, the exploitation of parallelism has been proposed previously. In this paper, we argue that naive parallelization strategies of network coding may result in unbalanced workload distribution, and thus, limiting performance improvements. We further argue that a higher performance enhancement can be achieved through balanced partitioning methods in parallelized network coding and propose new parallelization techniques for network coding. Our experiments show that on a quad-core processor system, proposed algorithms exhibit up to 5.69 speedup which is better than the linear speedup with the influence of additional cache. Moreover, on an octal-core system, our algorithms even achieve speedup of 8.46 compared to a sequential network coding and 43.3 percent faster than an existing parallelized technique using 1 Mbytes data with 1,024 x 1,024 coefficient matrix size.
A new approach to software fault tolerance in concurrent programs modeled as reactive systems is proposed. It is based on a hierarchical structure and on the combined use of different fault tolerant schemes (e.g. tran...
详细信息
A new approach to software fault tolerance in concurrent programs modeled as reactive systems is proposed. It is based on a hierarchical structure and on the combined use of different fault tolerant schemes (e.g. transaction to protect data and conversation like scheme to protect processes). Among the merits of this new approach there is the possibility of an effective use of different programming languages to implement diverse software versions also in concurrent programs.
Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violations are common and important. How to test t...
详细信息
Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violations are common and important. How to test the interleaving space and expose atomicity-violation bugs is an open problem. This paper makes three contributions. First, it designs and evaluates a hierarchy of four interleaving coverage criteria using 105 real-world concurrency bugs. This study finds a coverage criterion (Unserializable Interleaving Coverage) that balances the complexity and the capability of exposing atomicity-violation bugs well. Second, it studies stress testing to understand why this common practice cannot effectively expose atomicity-violation bugs from the perspective of unserializable interleaving coverage. Third, it designs CTrigger following the unserializable interleaving coverage criterion. CTrigger uses trace analysis to identify feasible unserializable interleavings, and then exercises low-probability interleavings to expose atomicity-violation bugs. We evaluate CTrigger with real-world atomicity-violation bugs from seven applications. CTrigger efficiently exposes these bugs within 1-235 seconds, two to four orders of magnitude faster than stress testing. Without CTrigger, some of these bugs do not manifest even after seven days of stress testing. Furthermore, once a bug is exposed, CTrigger can reliably reproduce it, usually within 5 seconds, for diagnosis.
暂无评论