We introduce concurrent Kleene algebra with tests (CKAT) as a combination of Kleene algebra with tests (KAT) of Kozen and Smith with concurrent Kleene algebras (CKA), introduced by Hoare, Moller, Struth and Wehrman. C...
详细信息
We introduce concurrent Kleene algebra with tests (CKAT) as a combination of Kleene algebra with tests (KAT) of Kozen and Smith with concurrent Kleene algebras (CKA), introduced by Hoare, Moller, Struth and Wehrman. CKAT provides a relatively simple algebraic model for reasoning about semantics of concurrent programs. We generalize guarded strings to guarded series-parallel strings, or gsp-strings, to give a concrete language model for CKAT. Combining nondeterministic guarded automata of Kozen with branching automata of Lodaya and Weil one obtains a model for processing gsp-strings in parallel. To ensure that the model satisfies the weak exchange law (x parallel to y)(z parallel to w) <= (xz)parallel to(yw) of CKA, we make use of the subsumption order of Gischer on the gsp-strings. We also define deterministic branching automata and investigate their relation to (nondeterministic) branching automata. To express basic concurrent algorithms, we define concurrent deterministic flowchart schemas and relate them to branching automata and to concurrent Kleene algebras with tests. (C) 2016 Elsevier Inc. All rights reserved.
This paper describes a safe and efficient combination of the object-based message-driven execution and shared array parallel programming models. In particular, we demonstrate how this combination engenders the composi...
详细信息
This paper describes a safe and efficient combination of the object-based message-driven execution and shared array parallel programming models. In particular, we demonstrate how this combination engenders the composition of loosely coupled parallel modules safely accessing a common shared array. That loose coupling enables both better flexibility in parallel execution and greater ease of implementing multi-physics simulations. As a case study, we describe how the parallelization of a new method for molecular dynamics simulation benefits from both of these advantages. We also describe a system of typed handle objects that embed some of the determinacy constraints of the Multiphase Shared Array programming model in the C++ type system, to catch some violations at compile time. The combined programming model communicates in terms of these handles as a natural means of detecting and preventing errors. (C) 2011 Elsevier B.V. All rights reserved.
In the age of the Internet of Things and social media platforms, huge amounts of digital data are generated by and collected from many sources, including sensors, mobile devices, wearable trackers and security cameras...
详细信息
In the age of the Internet of Things and social media platforms, huge amounts of digital data are generated by and collected from many sources, including sensors, mobile devices, wearable trackers and security cameras. This data, commonly referred to as Big Data, is challenging current storage, processing, and analysis capabilities. New models, languages, systems and algorithms continue to be developed to effectively collect, store, analyze and learn from Big Data. Most of the recent surveys provide a global analysis of the tools that are used in the main phases of Big Data management (generation, acquisition, storage, querying and visualization of data). Differently, this work analyzes and reviews parallel and distributed paradigms, languages and systems used today to analyze and learn from Big Data on scalable computers. In particular, we provide an in-depth analysis of the properties of the main parallelprogramming paradigms (MapReduce, workflow, BSP, message passing, and SQL-like) and, through programming examples, we describe the most used systems for Big Data analysis (e.g., Hadoop, Spark, and Storm). Furthermore, we discuss and compare the different systems by highlighting the main features of each of them, their diffusion (community of developers and users) and the main advantages and disadvantages of using them to implement Big Data analysis applications. The final goal of this work is to help designers and developers in identifying and selecting the best/appropriate programming solution based on their skills, hardware availability, application domains and purposes, and also considering the support provided by the developer community.
This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Super...
详细信息
This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking.
Many of the world's leading supercomputer architectures are a hybrid of shared memory and network-distributed memory. Such an architecture lends itself to a hybrid MPI-thread programming model. We first present an...
详细信息
Many of the world's leading supercomputer architectures are a hybrid of shared memory and network-distributed memory. Such an architecture lends itself to a hybrid MPI-thread programming model. We first present an implementation of inter-thread message passing based on the MPI and pthread libraries. In addition, we present an efficient implementation of termination detection for communication rounds. We use the term phased message passing to denote the communication interface based on this termination detection. This interface is then used to implement parallel operations for adaptive unstructured meshes, and the performance of resulting applications is compared to pure MPI operation. We also present new workflows enabled by the ability to vary the number of threads during run-time. (C) 2016 Elsevier B.V. All rights reserved.
Writing a correct parallel program is difficult; writing a highly modular parallel program that performs well in a multiprogrammed environment is even more so. Intel Threading Building Blocks (Intel TBB), a key compon...
详细信息
Writing a correct parallel program is difficult; writing a highly modular parallel program that performs well in a multiprogrammed environment is even more so. Intel Threading Building Blocks (Intel TBB), a key component of Intel parallel Building Blocks , is a widely used C++ template library that helps developers achieve this goal. The Intel TBB task scheduler uses a process-wide thread pool to establish a composable execution environment that balances a load while quickly adapting to changes in resource availability. Building on top of the task scheduler, the library implements prepackaged, highly tuned algorithms for frequently used parallel idioms. It also provides several concurrent containers and useful, low-level synchronization constructs to help developers safely and efficiently manage their parallel application's data.
We present TPR0F, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TPRoF dynamically traces the parallel execution and uses a nove...
详细信息
We present TPR0F, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TPRoF dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TPRoF apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TPRoF by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS). (C) 2014 Published by Elsevier Inc.
作者:
Merigot, AUNIV PARIS 11
CNRSURA22 INTEGRATED CIRCUITS & SYST ARCHITECTURE GRP FUNDAMENTAL ELECT INST ORSAY FRANCE
This paper presents a new parallel computing model called Associative Nets. This model relies on basic primitives called associations that consist to apply an associative operator over connected components of a subgra...
详细信息
This paper presents a new parallel computing model called Associative Nets. This model relies on basic primitives called associations that consist to apply an associative operator over connected components of a subgraph of the physical interprocessor connection graph. Associations can be very efficiently implemented (in terms of hardware cost or processing time) thanks to asynchronous computation. This model is quite effective for image analysis and several other fields;as an example, graph processing algorithms are presented. While relying on a much simpler architecture, these algorithms have, in general, a complexity equivalent to the one obtained by more expensive computing models, like the PRAM model.
We present ***, a lightweight, asynchronous task execution framework targeting the Python programming language and using the Message Passing Interface (MPI) for interprocess communication. *** follows the interface of...
详细信息
We present ***, a lightweight, asynchronous task execution framework targeting the Python programming language and using the Message Passing Interface (MPI) for interprocess communication. *** follows the interface of the *** package from the Python standard library and can be used as its drop-in replacement, while allowing applications to scale over multiple compute nodes. We discuss the design, implementation, and feature set of *** and compare its performance to other solutions on both shared and distributed memory architectures. On a shared-memory system, we show *** to consistently outperform Python's *** with speedup ratios between 1.4X and 3.7X in throughput (tasks per second) and between 1.9X and 2.9X in bandwidth. On a Cray XC40 system, we compare *** to Dask - a well-known Python parallel computing package. Although we note more varied results, we show *** to outperform Dask in most scenarios.
The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important ...
详细信息
The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a solution for easily parallelizing generic scripts on distributed infrastructures, since the current alternatives mostly require the use of APIs for message passing or are restricted to embarrassingly parallel computations. In that sense, this paper presents PyCOMPSs, a framework that facilitates the development of parallel computational workflows in Python. In this approach, the user programs her script in a sequential fashion and decorates the functions to be run as asynchronous parallel tasks. A runtime system is in charge of exploiting the inherent concurrency of the script, detecting the data dependencies between tasks and spawning them to the available resources. Furthermore, we show how this programming model can be built on top of a Big Data storage architecture, where the data stored in the backend is abstracted and accessed from the application in the form of persistent objects.
暂无评论