Developing parallel software using current tools can be challenging. Developers must reason carefully about the use of locks to avoid both race conditions and deadlocks. We present a compiler-assisted approach to para...
详细信息
Although MPI is a de-facto standard for parallel programming on distributed memory systems, writing MPI programs is often a time-consuming and complicated process. XcalableMP is a language extension of C and Fortran f...
详细信息
Branch and Fix Coordination is an algorithm intended to solve large scale multi-stage stochastic mixed integer problems, based on the particular structure of such problems, so that they can be broken down into smaller...
详细信息
Branch and Fix Coordination is an algorithm intended to solve large scale multi-stage stochastic mixed integer problems, based on the particular structure of such problems, so that they can be broken down into smaller subproblems. With this in mind, it is possible to use distributed computation techniques to solve the several subproblems in a parallel way, almost independently. To guarantee non-anticipativity in the global solution, the values of the integer variables in the subproblems are coordinated by a master thread. Scenario 'clusters' lend themselves particularly well to parallelisation, allowing us to solve some problems noticeably faster. Thanks to the decomposition into smaller subproblems, we can also attempt to solve otherwise intractable instances. In this work, we present details on the computational implementation of the Branch and Fix Coordination algorithm. (C) 2015 The Authors. Published by Elsevier B.V.
Multiprocessors are now commonplace, and cloud computing is swiftly following suit. While it is possible to write high performance code for these systems, concurrency bugs are extremely common and theoretical performa...
详细信息
ISBN:
(纸本)9781450304276
Multiprocessors are now commonplace, and cloud computing is swiftly following suit. While it is possible to write high performance code for these systems, concurrency bugs are extremely common and theoretical performance is often difficult to realize. In order to take advantage of increasing numbers of parallel resources, numerous parallel programming systems have been proposed and deployed, usually without a systematic evaluation of their usability. In order to make both programmers and their parallel applications more effective, we need more useful metrics for measuring programmer productivity and a better way to evaluate such metrics. We posit that usability is a key factor in the effectiveness of a parallel programming system, and that theoretical performance gains can only be realized if programmers are able to successfully reason about their parallel code. Copyright 2010 ACM.
Task-parallel programming is a methodology in which algorithms are specified as a set of tasks to be executed, and the dependencies between them. A scheduler can then automatically determine the correct execution orde...
详细信息
ISBN:
(纸本)9781450305242
Task-parallel programming is a methodology in which algorithms are specified as a set of tasks to be executed, and the dependencies between them. A scheduler can then automatically determine the correct execution order and extract parallelism. Task programming is well-known to be a very effective way to leverage parallel hardware (and is gaining popularity among game developers [Lavaire and Quenin 2010]), however there is significant programming overhead associated with maintaining a program in this form.
This paper describes the stapl Skeleton Framework, a high-level skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and ena...
详细信息
ISBN:
(纸本)9783319174730;9783319174723
This paper describes the stapl Skeleton Framework, a high-level skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and enables them to express parallel programs as a composition of existing elementary skeletons such as map, map-reduce, scan, zip, butterfly, allreduce, alltoall and user-defined custom skeletons. Skeletons in this framework are defined as parametric data flow graphs, and their compositions are defined in terms of data flow graph compositions. Defining the composition in this manner allows dependencies between skeletons to be defined in terms of point-to-point dependencies, avoiding unnecessary global synchronizations. To show the ease of composability and expressivity, we implemented the NAS Integer Sort (IS) and Embarrassingly parallel (EP) benchmarks using skeletons and demonstrate comparable performance to the hand-optimized reference implementations. To demonstrate scalable performance, we show a transformation which enables applications written in terms of skeletons to run on more than 100,000 cores.
This paper focuses on the problem of computing the minimal test suite for a terminating multithreaded program that covers all its executable statements. We have in previous work shown how to use unfoldings to capture ...
详细信息
ISBN:
(纸本)9781467378826
This paper focuses on the problem of computing the minimal test suite for a terminating multithreaded program that covers all its executable statements. We have in previous work shown how to use unfoldings to capture the true concurrency semantics of multithreaded programs and to generate test cases for it. In this paper we rely on this earlier work and show how the unfolding can be used to generate the minimal test suite that covers all the executable statements of the program. The problem of generating such a minimal test suite is shown to be NP-complete in the size of the unfolding, and as a side result, covering executable transitions of any terminating safe Petri net is also NP-complete in the size of its unfolding. We propose SMT-encodings to these problems and give initial results on applying this encoding to compute the minimal test suite for several benchmarks.
The age of multi-core computers is upon us, yet current programming languages, typically designed for single-core computers and adapted post hoc for multi-cores, remain tied to the constraints of a sequential mindset ...
详细信息
ISBN:
(纸本)9783319189413;9783319189406
The age of multi-core computers is upon us, yet current programming languages, typically designed for single-core computers and adapted post hoc for multi-cores, remain tied to the constraints of a sequential mindset and are thus in many ways inadequate. New programming language designs are required that break away from this old-fashioned mindset. To address this need, we have been developing a new programming language called Encore, in the context of the European Project UpScale. The paper presents a motivation for the Encore language, examples of its main constructs, several larger programs, a formalisation of its core, and a discussion of some future directions our work will take. The work is ongoing and we started more or less from scratch. That means that a lot of work has to be done, but also that we need not be tied to decisions made for sequential language designs. Any design decision can be made in favour of good performance and scalability. For this reason, Encore offers an interesting platform for future exploration into object-oriented parallel programming.
Many libraries in the HPC field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurat...
详细信息
ISBN:
(纸本)9781450335591
Many libraries in the HPC field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI as an example, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.
The Unscented Kalman Filter (UKF) is widely used to solve nonlinear systems, like submarine tracking, aircraft surveillance, autonomous robotics and mobile systems. One of the typical problems solved using UKF is Bear...
详细信息
ISBN:
(纸本)9781479981632
The Unscented Kalman Filter (UKF) is widely used to solve nonlinear systems, like submarine tracking, aircraft surveillance, autonomous robotics and mobile systems. One of the typical problems solved using UKF is Bearing-Only Target Motion Analysis (BOTMA) for manoeuvring and non manoeuvring targets. This paper proposes a methodology for parallel execution of UKF with an aim to enhance its performance in terms of computational throughput. parallel algorithm and its execution of UKF for BOTMA will use multi-core processor environment. The study concentrate on identifying the phases of UKF enabled BOTMA that can be parallelized to execute on the hardware underneath to enhance the response time. The performance is observed and results are verified.
暂无评论