In this paper, we present teaching experiences of offering a course on large-scale parallel computing using message passing interface (MPI). This particular course was offered to under-graduates and graduates for the ...
详细信息
ISBN:
(数字)9781728148946
ISBN:
(纸本)9781728148953
In this paper, we present teaching experiences of offering a course on large-scale parallel computing using message passing interface (MPI). This particular course was offered to under-graduates and graduates for the first time in the Department of Computer Science and Engineering, Indian Institute of Technology Kanpur in a very long time. We will present what topics were covered, how we decided the course content, the class demographics, what resources were made available for the students to run their MPI jobs and discuss the output of the course. We will also discuss what were the stumbling blocks encountered while offering a parallel computing systems course without much support from teaching assistants, and some lessons that we took forward to the next time we offered this course.
With the rapid change of computing architectures, and variety of programming models; the ability to develop performance portable applications has become of great importance. This is particularly true in large producti...
详细信息
ISBN:
(纸本)9781450362252
With the rapid change of computing architectures, and variety of programming models; the ability to develop performance portable applications has become of great importance. This is particularly true in large production codes where developing and maintaining hardware specific versions is *** simplify the development of performance portable code, we introduce RAJA, our C++ library that allows developers to write single-source applications that can target multiple hardware and programming model back-ends. We provide a thorough introduction to all of RAJA features, and walk through some hands-on examples that will allow attendees to understand how RAJA might benefit their own applications. Attendees should bring a laptop computer to participate in the hands-on *** tutorial will introduce attendees to RAJA, a C++ library for developing performance portable applications. Attendees will learn how to write performance portable code that can execute on a range of programming models (OpenMP, CUDA, Intel TBB, and HCC) and hardware (CPU, GPU, Xeon Phi).Specifically, attendees will learn how to convert existing C++ applications to use RAJA, and how to use RAJA's programming abstractions to expose existing parallelism in their applications without complex algorithm rewrites. We will also cover specific guidelines for using RAJA in a large application, including some common "gotchas" and how to handle memory management. Finally, attendees will learn how to categorize loops to allow for simple and systematic performance tuning on any architecture.
The introductory-level courses on parallel programming, typically, do not cover the topic of code correctness. Often, students learn about the logical errors in parallel programs and troubleshoot them through trial an...
详细信息
ISBN:
(数字)9781728148946
ISBN:
(纸本)9781728148953
The introductory-level courses on parallel programming, typically, do not cover the topic of code correctness. Often, students learn about the logical errors in parallel programs and troubleshoot them through trial and error, and spend a significant amount of time and effort in the process. A systematic pedagogical approach to teaching parallel code correctness is therefore needed to enhance the productivity of students and instructors. In this paper, we describe some theoretical and practical approaches that can be adopted for assessing and teaching parallel code correctness. The theoretical approaches include using formal methods (e.g., Petri nets and Hoare Logic). We apply these approaches on the test cases discussed in this paper. The practical approach involves teaching code correctness through demonstrations. For enabling this, we have not only curated a repository of parallel programs with commonly made logical errors but have also added a high-level interface on top of the repository for quickly comparing fixed and incorrect versions of the sample code in the repository, seeing the explanation text on the errors, and searching the repository on the basis of the causes and symptoms of logical errors. The work presented in this paper can potentially motivate the instructors in including the content on code correctness in their parallel programming courses and trainings.
In fault tolerant systems, applications are replicated and executed to enable error detection and recovery. If one replica application fails, another is able to take its place and provide the correct results. This con...
详细信息
ISBN:
(纸本)9783800749577
In fault tolerant systems, applications are replicated and executed to enable error detection and recovery. If one replica application fails, another is able to take its place and provide the correct results. This concept can benefit from parallel execution on separate execution units. The rise of multicore platforms supports the development of parallel software, by providing the adequate hardware. However, this raises challenges regarding the synchronization of the redundant strings of execution. Replica determinism means that given the same input, identical programs provide the same output. To ensure replica determinism, requirements regarding the synchronization can be split in two domains: data and time. This paper examines the state of the art of synchronization techniques for parallel replicated execution in the context of fault tolerant systems. We analyze the requirements regarding synchronization within the time and data domain and compare different concepts of hardware (multicore, multiprocessor and multi-PCB) and software (processes, threads).
Fortran marches on, remaining one of the principal programming languages used in high-performance scientific, numerical, and engineering computing. A series of significant revisions to the standard versions of the lan...
详细信息
ISBN:
(数字)9780191850028
ISBN:
(纸本)9780198811893
Fortran marches on, remaining one of the principal programming languages used in high-performance scientific, numerical, and engineering computing. A series of significant revisions to the standard versions of the language have progressively enhanced its capabilities, and the latest standard—Fortran 2018—includes many additions and improvements. This second edition of Modern Fortran Explained expands on the first. Given the release of updated versions of Fortran compilers, the separate descriptions of Fortran 2003 and Fortran 2008 have been incorporated into the main text, which thereby becomes a unified description of the full Fortran 2008 version of the language. This is much cleaner, many deficiencies and irregularities in the earlier language versions having been resolved. It includes object orientation and parallel processing with coarrays. Four completely new chapters describe the additional features of Fortran 2018, with its enhancements to coarrays for parallel programming, interoperability with C, IEEE arithmetic, and various other improvements. Written by leading experts in the field, two of whom have actively contributed to Fortran 2018, this is a complete and authoritative description of Fortran in its latest form. It is intended for new and existing users of the language, and for all those involved in scientific and numerical computing. It is suitable as a textbook for teaching and, with its index, as a handy reference for practitioners.
We propose a formal definition of the notion of textual alignment as is used in programming languages proposing SPMD-like collective operations. We argue that this property provides an intuitive programming model that...
详细信息
ISBN:
(纸本)9781450351911
We propose a formal definition of the notion of textual alignment as is used in programming languages proposing SPMD-like collective operations. We argue that this property provides an intuitive programming model that makes it easier to perform program analysis and program optimization. Here, textual alignment is studied in the context of the operational semantics of a basic imperative programming language. This language provides support for global synchronization barriers. The semantics records suitable information concerning the parallel execution flow of programs and identifies textually aligned code segments. We prove that our definition of textual alignment entails the absence of deadlocks.
In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunicatio...
详细信息
In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present the curriculum concerning theory and practice as well as we look at the last eight years of students' performance data gathered while teaching the course. We investigate important components and features of the hardware and software needed for conducting a modern set of exercises, along with changes adopted throughout the last years and experiences gained regarding lectures and laboratories.
Multi-, many-core, hybrid processors and parallel programming languages are slowly becoming pervasive in mainstream computing. It is expected that they will affect a large spectrum of systems, from embedded and genera...
详细信息
ISBN:
(纸本)9781538666289
Multi-, many-core, hybrid processors and parallel programming languages are slowly becoming pervasive in mainstream computing. It is expected that they will affect a large spectrum of systems, from embedded and general-purpose, to high-end computing systems. This architectural change has already challenged programmers to efficiently write an application code that can scale over many cores to utilize its computational power. Moreover, many heterogeneous architectures exist today, hence there was an emergent need for a uniform interface to these architectures. Recently, Khronos Group defined the Open Computing Language (OpenCL) for abstracting the underlying hardware, which enables software developers to write a portable code across different shared-memory architectures. In this paper, we introduce a new parallel implementation of one of the fastest image segmentation algorithms known as Simple Linear Iterative Clustering based on OpenCL. We evaluate the effectiveness of this implementation using only multi-core GPCPU. Our implementation is fully compatible with sequential implementation. When the algorithm is executed sequentially it utilizes only 25% of total computational power of a GPCPU for any image resolution, while its modified algorithm is able to utilize close to 100% for high resolution images. The resulting algorithm is up to 5x faster than its sequential counterpart.
Control-flow obfuscation increases program complexity by semantic-preserving transformation. Opaque predicates are essential gadgets to achieve such transformation. However, we observe that real-world opaque predicate...
详细信息
ISBN:
(纸本)9781538655962
Control-flow obfuscation increases program complexity by semantic-preserving transformation. Opaque predicates are essential gadgets to achieve such transformation. However, we observe that real-world opaque predicates are generally very simple and engage little security consideration. Recently, such insecure opaque predicates have been severely attacked by symbolic execution-based adversaries and jeopardize the security of control-flow obfuscation. This paper, therefore, proposes symbolic opaque predicates which can be resilient to symbolic execution-based adversaries. We design a general framework to compose such opaque predicates, which requires introducing challenging symbolic analysis problems (e.g., symbolic memory) in each opaque predicate. In this way, we may mislead symbolic execution engines into reaching false conclusions. We observe a novel bi-opaque property about symbolic opaque predicates, which can incur not only false negative issues but also false positive issues to attackers. To evaluate the efficacy of our idea, we have implemented a prototype obfuscation tool based on Obfuscator-LLVM and conduct experiments with real-world programs. Our evaluation results show that symbolic opaque predicates demonstrate excellent resilience to prevalent symbolic execution engines, such as BAP, Triton, and Angr. Moreover, although the costs of symbolic opaque predicates may vary for different problem settings, some predicates can be very efficient. Therefore, our framework is both secure and usable. Users can follow the framework to introduce symbolic opaque predicates into their obfuscation tools and made them more powerful.
Accurate background subtraction is an essential tool for high level computer vision applications. However, as research continues to increase the accuracy of background subtraction algorithms, computational efficiency ...
详细信息
ISBN:
(数字)9783030038014
ISBN:
(纸本)9783030038014;9783030038007
Accurate background subtraction is an essential tool for high level computer vision applications. However, as research continues to increase the accuracy of background subtraction algorithms, computational efficiency has often suffered as a result of increased complexity. Consequentially, many sophisticated algorithms are unable to maintain real-time speeds with increasingly high resolution video inputs. To combat this unfortunate reality, we propose to exploit the inherently parallelizable nature of background subtraction algorithms by making use of NVIDIA's parallel computing platform known as CUDA. By using the CUDA interface to execute parallel tasks in the Graphics Processing Unit (GPU), we are able to achieve up to a two orders of magnitude speed up over traditional techniques. Moreover, the proposed GPU algorithm achieves over 8x speed over its CPU-based background subtraction implementation proposed in our previous work [1].
暂无评论