The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Class...
详细信息
ISBN:
(纸本)9781538649756
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.
The importance of high-performance computing (HPC) motivates an increasing number of students to study parallel programming. However, a major obstacle for students to learn parallel programming is the lack of large-sc...
详细信息
ISBN:
(纸本)9781538665220
The importance of high-performance computing (HPC) motivates an increasing number of students to study parallel programming. However, a major obstacle for students to learn parallel programming is the lack of large-scale computing resources and feedback for their programs. In this paper, we design and implement an online HPC educational programming system, which provides free computing resources for students and the support of multiple HPC programming languages. The students can easily write HPC programs on our platform and submit to the Tianhe-2 supercomputer for execution. The execution results will be returned and displayed on the front-end web browser of users. In addition, our system also supports code evaluation and feedback debugging via integrating mpiP and TAU into our system kernel. The evaluation results and feedback allow students to look into the execution details of their programs and further optimize their submitted programs.
Correctly synchronizing multithreaded programs is challenging, and errors can lead to program failures (e.g., atomicity violations). Existing memory consistency models rule out some possible failures, but are limited ...
详细信息
Correctly synchronizing multithreaded programs is challenging, and errors can lead to program failures (e.g., atomicity violations). Existing memory consistency models rule out some possible failures, but are limited by depending on subtle programmer-defined locking code and by providing unintuitive semantics for incorrectly synchronized code. Stronger memory consistency models assist programmers by providing them with easier-to-understand semantics with regard to memory access interleavings in parallel code. This dissertation proposes a new strong memory consistency model based on ordering-free regions (OFRs), which are spans of dynamic instructions between consecutive ordering constructs (e.g. barriers). Atomicity over ordering-free regions provides stronger atomicity than existing strong memory consistency models with competitive performance. Ordering-free regions also simplify programmer reasoning by limiting the potential for atomicity violations to fewer points in the program’s execution. This dissertation explores both software-only and hardware-supported systems that provide OFR serializability.
parallel programming techniques have been prominently explored in various engineering applications as it provides a time efficient solution to the complex problems without affecting the accuracy. parallel programming ...
详细信息
Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS parallel Benchmarks (NPB) comprise legacy codes that still lac...
详细信息
Today powerful parallel computer architectures empower numerous application areas in personal computing and consumer electronics and parallel computation is an established mainstay in personal mobile devices (PMD). Du...
详细信息
Today powerful parallel computer architectures empower numerous application areas in personal computing and consumer electronics and parallel computation is an established mainstay in personal mobile devices (PMD). During last ten years PMDs have been equipped with increasingly powerful parallel computation architectures (CPU+GPU) enabling rich gaming, photography and multimedia experiences and general purpose parallel computation through application programming interfaces such as OpenGL ES and Apple Metal. Using a narrative literature review this study viewed into current status of parallel computing and parallel programming and specifically its application and practices of digital image processing applied in the domain of Mobile Systems (MS) and Personal Mobile Devices (PMD). While the research on the context is an emerging topic, there still is a limited amount of research available on the topic. As acknowledged in the literature and in the practice, the OpenGL ES programming model for computing tasks can be a challenging environment for many programmers. With OpenGL ES, the paradigm shift from serial- to parallel programming, in addition to changes and challenges in used programming language and the tools supporting the development, can be a barrier for many programmers. In this thesis a Design Science Research (DSR) approach was applied to build an artefact – an image- and video processing application on Apple iOS software platform using OpenGL ES parallel programming model. An Open Source Software (OSS) parallel computing library GPUImage was applied in the implementation of the artefact filtering- and effects functionality. Using the library, the process of applying the parallel programming model was efficient and productive. The used library structures and functionality were effectively suppressing the complexity of OpenGL ES setup- and management programming and provided efficient filter structures for implementing image- and video filters and effects. The
OCaml is a multi-paradigm (functional, imperative, object-oriented) high level sequential language. Types are stati¬cally inferred by the compiler and the type system is expressive and strong. These features make...
详细信息
ISBN:
(纸本)9781538678800
OCaml is a multi-paradigm (functional, imperative, object-oriented) high level sequential language. Types are stati¬cally inferred by the compiler and the type system is expressive and strong. These features make OCaml a very productive language for developing efficient and safe programs. In this tutorial we present three frameworks for using OCaml to program scalable parallel architectures: BSML, Multi-ML and Spoc.
Composability is a key component to improve programmers' productivity in writing fast market-expanding applications such as parallel machine learning algorithms and big data analytics. These applications exhibit b...
详细信息
ISBN:
(纸本)9781728150208
Composability is a key component to improve programmers' productivity in writing fast market-expanding applications such as parallel machine learning algorithms and big data analytics. These applications exhibit both regular and irregular compute patterns, and are often combined with other functions or libraries to compose a larger program. However, composable parallel processing has taken a back seat in many existing parallel programming libraries, making it difficult to achieve modularity in large-scale parallel programs. In this paper, we introduce a new parallel task programming library using composable tasking graphs. Our library efficiently supports task parallelism together with an intuitive task graph construction and flexible execution API set to enable reusable and composable task dependency graphs. Developers can quickly compose a large parallel program from small and modular parallel building blocks, and easily deploy the program on a multicore machine. We have evaluated our library on real-world applications. Experimental results showed our library can achieve comparable performance to Intel Threading Building Blocks with less coding effort.
This paper explores parallel nondeterministic programming as an extension to the C programming language;it provides constructs for specifying code containing ambiguous choice as introduced by McCarthy. A translator to...
详细信息
ISBN:
(纸本)9781450369800
This paper explores parallel nondeterministic programming as an extension to the C programming language;it provides constructs for specifying code containing ambiguous choice as introduced by McCarthy. A translator to plain C code was implemented as an extension to the ableC language specification. Translation involves a transformation to continuation passing style, providing lazy choice by storing continuation closures in a separate task buffer. This exploration considers various search evaluation approaches and their impact on correctness and performance. Multiple search drivers were implemented, including single-threaded depth-first search, a combined breadth- and depth-first approach, as well as two approaches to parallelism. Several benchmark applications were created using the extension, including n-Queens, SAT, and triangle peg solitaire. The simplest parallel search driver, using independent threads, showed the best performance in most cases, providing a significant speedup over the sequential versions. Adding task sharing between threads showed similar or slightly improved performance.
New sequencing technologies has been increasing the size of current genomes rapidly reducing its cost at the same time, those data need to be processed with efficient and innovated tools using high performance computi...
详细信息
ISBN:
(纸本)9783319665627;9783319665610
New sequencing technologies has been increasing the size of current genomes rapidly reducing its cost at the same time, those data need to be processed with efficient and innovated tools using high performance computing (HPC), but for taking advantage of nowadays supercomputers, parallel programming techniques and strategies have to be used. Plant genomes are full of Long Terminal Repeat Retrotransposons (LTR-RT), which are the most frequent repeated sequences;very important agronomical commodity such as Robusta Coffee and Maize have genomes that are composed by similar to 50% and similar to 85% respectively of this class of mobile elements, new parallel bioinformatics pipelines are making possible to use whole genomes like those in research projects, generating a lot of new information and impacting in many ways the knowledge that researchers have about them. Here we presented the utility of multi-core architectures and parallel programming for analyzing and classifying massive quantity of genomic information up to 16 times faster.
暂无评论