Concurrent Kleene algebras were introduced by Hoare, Moller, Struth and Wehrman in [HMSW09, HMSW09a, HMSW11] as idem-potent bisemirings that satisfy a concurrency inequation and have a Kleene-star for both sequential ...
详细信息
ISBN:
(数字)9783319062518
ISBN:
(纸本)9783319062501;9783319062518
Concurrent Kleene algebras were introduced by Hoare, Moller, Struth and Wehrman in [HMSW09, HMSW09a, HMSW11] as idem-potent bisemirings that satisfy a concurrency inequation and have a Kleene-star for both sequential and concurrent composition. Kleene algebra with tests (KAT) were defined earlier by Kozen and Smith [KS97]. Concurrent Kleene algebras with tests (CKAT) combine these concepts and give a relatively simple algebraic model for reasoning about operational semantics of concurrent programs. We generalize guarded strings to guarded series-parallel strings, or gsp-strings, to provide a concrete language model for CKAT. Combining nondeterministic guarded automata [Koz03] with branching automata of Lodaya andWeil [LW00] one obtains a model for processing gsp-strings in parallel, and hence an operational interpretation for CKAT. For gsp-strings that are simply guarded strings, the model works like an ordinary nondeterministic guarded automaton. If the test algebra is assumed to be {0, 1} the language model reduces to the regular sets of bounded-width sp-strings of Lodaya and Weil. Since the concurrent composition operator distributes over join, it can also be added to relation algebras with transitive closure to obtain the variety CRAT. We provide semantics for these algebras in the form of coalgebraic arrow frames expanded with concurrency.
Today's mainstream programming language concepts originate from a time when processes were executed in a single thread and the outcome of computation was deterministic. To deal with multi-threaded execution synchr...
详细信息
ISBN:
(纸本)9781479941162
Today's mainstream programming language concepts originate from a time when processes were executed in a single thread and the outcome of computation was deterministic. To deal with multi-threaded execution synchronization mechanisms have to be used to restrict parallel execution to a point where the program produces correct results for all possible interleaving executions. This is constantly leading to deadlocks and race conditions, i.e. undesired non-deterministic behavior. In this paper, we propose a new set of synchronization primitives, Spawn and Merge, that yield deterministic program execution for multi-threaded programs. This means that there are no race conditions when using this synchronization technique and deadlocks can be avoided right away. Concurrent access to data structures is resolved using operational transformation. Using two example scenarios we show how these synchronization primitives can be used and that they are equivalent to semaphores. Furthermore, we evaluate our framework by implementing a network simulator. We show that despite a constant overhead, the performance is comparable to using standard synchronization primitives while yielding deterministic results.
Task-based parallel programming models with explicit data dependencies, such as OmpSs, are gaining popularity, due to the ease of describing parallel algorithms with complex and irregular dependency patterns. These ad...
详细信息
ISBN:
(纸本)9781479961238
Task-based parallel programming models with explicit data dependencies, such as OmpSs, are gaining popularity, due to the ease of describing parallel algorithms with complex and irregular dependency patterns. These advantages, however, come at a steep cost of runtime overhead incurred by dynamic dependency resolution. Hardware support for task management has been proposed in previous work as a possible solution. We present VSs, a runtime library for the OmpSs programming model that integrates the Nexus++ hardware task manager, and evaluate the performance of the VSs-Nexus++ system. Experimental results show that applications with fine-grain tasks can achieve speedups of up to 3.4x, while applications optimized for current runtimes attain 1.3x. Providing support for hardware task managers in runtime libraries is therefore a viable approach to improve the performance of OmpSs applications.
In this article is described the development of soft and hard environment for integrating individual cluster systems in a single, integrate parallel HPC systems. The elaborated applications can be ported to the resour...
详细信息
ISBN:
(纸本)9781479968602
In this article is described the development of soft and hard environment for integrating individual cluster systems in a single, integrate parallel HPC systems. The elaborated applications can be ported to the resources of the integrated HPC system. To acquire the necessary theoretical and practical skills on using regional HPC clusters, currently it is preparing an interactive educational course for teaching students in the area of parallelprogramming, HPC clusters, and use of parallel software.
Benchmarking of architectures is today jeopardized by the explosion of parallel architectures and the dispersion of parallel programming models. parallelprogramming requires architecture dependent compilers and langu...
详细信息
ISBN:
(纸本)9783981537024
Benchmarking of architectures is today jeopardized by the explosion of parallel architectures and the dispersion of parallel programming models. parallelprogramming requires architecture dependent compilers and languages as well as high programming expertise. Thus, an objective comparison has become a harder task. This paper presents a novel methodology to evaluate and to compare parallel architectures in order to ease the programmer work. It is based on the usage of microbenchmarks, code profiling and characterization tools. The main contribution of this methodology is a semi-automatic prediction of the performance for sequential applications on a set of parallel architectures. In addition the performance estimation is correlated with the cost of other criteria such as power or portability. Our methodology prediction was validated on an industrial application. Results are within a range of 20%.
StarSs is a task-based programming model that allows to parallelize sequential applications by means of annotating the code with compiler directives. The model further supports transparent execution of designated task...
详细信息
StarSs is a task-based programming model that allows to parallelize sequential applications by means of annotating the code with compiler directives. The model further supports transparent execution of designated tasks on heterogeneous platforms, including clusters of GPUs. This paper focuses on the methodology and tools that complements the programming model forming a consistent development environment with the objective of simplifying the live of application developers. The programming environment includes the tools TAREADOR and TEMANEJO, which have been designed specifically for StarSs. TAREADOR, a Valgrind-based tool, allows a top-down development approach by assisting the programmer in identifying tasks and their data-dependencies across all concurrency levels of an application. TEMANEJO is a graphical debugger supporting the programmer by visualizing the task dependency tree on one hand, but also allowing to manipulate task scheduling or dependencies. These tools are complemented with a set of performance analysis tools (Scalasca, Cube and Paraver) that enable to fine tune StarSs application. (C) 2013 Elsevier By. All rights reserved.
The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emer...
详细信息
The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. As most MMCA researchers are not also HPC experts, however, there is a demand for programmingmodels and tools that are both efficient and easy to use. Existing user transparent parallelization tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. For certain MMCA applications a data parallel approach induces intensive communication, however, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches. We present Pyxis-DT, a user transparent parallelprogramming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation. Extensions for CPU clusters are also presented. (C) 2013 Elsevier B.V. All rights reserved.
As new heterogeneous systems and hardware accelerators appear, high performance computers can reach a higher level of computational power. Nevertheless, this does not come for free: the more heterogeneity the system p...
详细信息
ISBN:
(纸本)9780769549712
As new heterogeneous systems and hardware accelerators appear, high performance computers can reach a higher level of computational power. Nevertheless, this does not come for free: the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource management. OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications. This paper presents a set of extensions to this framework: we show how the application programmer can expose different specialized versions of tasks (i.e. pieces of specific code targeted and optimized for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application. From the results obtained in a multi-GPU system, we prove that our proposal gives flexibility to application's source code and can potentially increase application's performance.
Recently, graph computation has emerged as an important class of high-performance computing application whose characteristics differ markedly from those of traditional, compute-bound, kernels. Libraries such as BLAS, ...
详细信息
ISBN:
(纸本)9781450319225
Recently, graph computation has emerged as an important class of high-performance computing application whose characteristics differ markedly from those of traditional, compute-bound, kernels. Libraries such as BLAS, LAPACK, and others have been successful in codifying best practices in numerical computing. The data-driven nature of graph applications necessitates a more complex application stack incorporating runtime optimization. In this paper, we present a method of phrasing graph algorithms as collections of asynchronous, concurrently executing, concise code fragments which may be invoked both locally and in remote address spaces. A runtime layer performs a number of dynamic optimizations, including message coalescing, message combining, and software routing. Practical implementations and performance results are provided for a number of representative algorithms.
programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on a...
详细信息
programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on automatic function-level parallelism that targets productivity. StarSs deploys a data-flow model: it analyzes dependencies between tasks and manages their execution, exploiting their concurrency as much as possible. This paper introduces Cluster Superscalar (ClusterSs), a new StarSs member designed to execute on clusters of SMPs (Symmetric Multiprocessors). ClusterSs tasks are asynchronously created and assigned to the available resources with the support of the IBM APGAS runtime, which provides an efficient and portable communication layer based on one-sided communication. We present the design of ClusterSs on top of APGAS, as well as the programming model and execution runtime for Java applications. Finally, we evaluate the productivity of ClusterSs, both in terms of programmability and performance and compare it to that of the IBM X10 language. Copyright (c) 2012 John Wiley & Sons, Ltd.
暂无评论