Peachy parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of ...
详细信息
ISBN:
(纸本)9781728101903
Peachy parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of assignments includes implementing a subset of OpenMP using pthreads, creating an animated fractal, image processing using histogram equalization, simulating a storm of high-energy particles, and solving the wave equation in a variety of settings. All of these come with sample assignment sheets and the necessary starter code.
In this pre-exascale era, we are observing a dramatic increase of the necessity of computer science courses dedicated to parallel programming on heterogeneous architectures. The full hybrid cluster Romeo has been used...
详细信息
ISBN:
(纸本)9783319733531;9783319733524
In this pre-exascale era, we are observing a dramatic increase of the necessity of computer science courses dedicated to parallel programming on heterogeneous architectures. The full hybrid cluster Romeo has been used in that purpose since a long time in order to train master students and cluster users. The main issue for trainees is the cost of accessing and exploiting a production facility in a pedagogic context. The use of some specific techniques and software (SSH, workload manager, remote file system, ...) is mandatory without being part of courses prerequisites nor pedagogic objectives. The romeoLAB platform we developed at ROMEO HPC Center is an online interactive pedagogic platform for HPC and GPU technologies courses. Its main purpose is to simplify the process of resources usage in order to focus on the taught subjects. This paper presents the romeoLAB architecture as well as its motivations, usages and future improvements.
Expressive actor models combine aspects of functional programming into the pure actor model enriched with futures. Such functional features include first-class closures which can be passed between actors and chained o...
详细信息
ISBN:
(纸本)9781450360661
Expressive actor models combine aspects of functional programming into the pure actor model enriched with futures. Such functional features include first-class closures which can be passed between actors and chained on futures. Combined with mutable objects, this opens the door to race conditions. In some situations, closures may not be evaluated by the actor that created them yet may access fields or objects owned by that actor. In other situations, closures may be safely fired off to run as a separate task. This paper discusses the problem of who can safely evaluate a closure to avoid race conditions, and presents the current solution to the problem adopted by the Encore language. The solution integrates with Encore's capability type system, which influences whether a closure is attached and must be evaluated by the creating actor, or whether it can be detached and evaluated independently of its creator. Encore's current solution to this problem is not final or optimal. We conclude by discussing a number of open problems related to dealing with closures in the actor model.
At the LHC, particles are collided in order to understand how the universe was created. Those collisions are called events and generate large quantities of data, which have to be pre-filtered before they are stored to...
详细信息
ISBN:
(纸本)9781538649756
At the LHC, particles are collided in order to understand how the universe was created. Those collisions are called events and generate large quantities of data, which have to be pre-filtered before they are stored to hard disks. This paper presents a parallel implementation of these algorithms that is specifically designed for the Intel Xeon Phi Knights Landing platform, exploiting its 64 cores and AVX-512 instruction set. It shows that a linear speedup up until approximately 64 threads is attainable when vectorization is used, data is aligned to cache line boundaries, program execution is pinned to MCDRAM, mathematical expressions are transformed to a more efficient equivalent formulation, and OpenMP is used for parallelization. The code was transformed from being compute bound to memory bound. Overall, a speedup of 36.47x was reached while obtaining an error which is smaller than the detector resolution.
Many real-world applications feature data accesses on periodic domains. Manually implementing the synchronizations and communications associated to the data dependences on each case is cumbersome and error-prone. It i...
详细信息
Many real-world applications feature data accesses on periodic domains. Manually implementing the synchronizations and communications associated to the data dependences on each case is cumbersome and error-prone. It is increasingly interesting to support these applications in high-level parallel programming languages or parallelizing compilers. In this paper, we present a technique that, for distributed-memory systems, calculates the specific communications derived from data-parallel codes with or without periodic boundary conditions on affine access expressions. It makes transparent to the programmer the management of aggregated communications for the chosen data partition. Our technique moves to runtime part of the compile-time analysis typically used to generate the communication code for affine expressions, introducing a complete new technique that also supports the periodic boundary conditions. We present an experimental study to evaluate our proposal using several study cases. Our experimental results show that our approach can automatically obtain communication codes as efficient as those found in MPI reference codes, reducing the development effort.
Low latency is a fundamental requirement for Virtual Reality (VR) systems to reduce the potential risks of cybersickness and to increase effectiveness, efficiency and user experience. In contrast to the effects of uni...
详细信息
ISBN:
(纸本)9781538633656
Low latency is a fundamental requirement for Virtual Reality (VR) systems to reduce the potential risks of cybersickness and to increase effectiveness, efficiency and user experience. In contrast to the effects of uniform latency degradation, the influence of latency jitter on user experience in VR is not well researched, although today's consumer VR systems are vulnerable in this respect. In this work we report on the impact of latency jitter on cybersickness in HMD-based VR environments. Test subjects are given a search task in Virtual Reality, provoking both head rotation and translation. One group experienced artificially added latency jitter in the tracking data of their head-mounted display. The introduced jitter pattern was a replication of a real-world latency behavior extracted and analyzed from an existing example VR-system. The effects of the introduced latency jitter were measured based on self-reports simulator sickness questionnaire (SSQ) and by taking physiological measurements. We found a significant increase in self-reported simulator sickness. We therefore argue that measure and control of latency based on average values taken at a few time intervals is not enough to assure a required timeliness behavior but that latency jitter needs to be considered when designing experiences for Virtual Reality.
Integration of intermittent renewable energy resources to the power system necessitates the development of fast computational methods and tools to enable real-time monitoring, control, and decision making in the power...
详细信息
ISBN:
(纸本)9781538671382
Integration of intermittent renewable energy resources to the power system necessitates the development of fast computational methods and tools to enable real-time monitoring, control, and decision making in the power grid. Generally, techniques which can be used to increase the computational speed are summarized in algorithm improvement and hardware acceleration. In this paper, the serial version of the Newton-Raphson power flow algorithm has been transformed to a parallel solution by using OpenMP standard. The parallel implementation is tested on several power systems and the computational efficiency is compared with varying thread numbers. The experimental results show more than three times speedup ratio achievement and significant computational time reduction.
The complexities involved in parallel programming encourage frameworks to detach programmers from these concerns via higher-level abstraction. The high-performance nature of parallel computing drifts the focus of thes...
详细信息
ISBN:
(纸本)9781538655559
The complexities involved in parallel programming encourage frameworks to detach programmers from these concerns via higher-level abstraction. The high-performance nature of parallel computing drifts the focus of these programming environments towards facilitating and safeguarding faster computations. Therefore, aspects such as asynchronous graphical user interfaces (GUIs) do not see as much emphasis, even though many applications today depend on concurrent human-computer interactions. The significance of this topic is growing such that facilitating the efficient management of asynchronous GUI operations is currently a virtue, but will soon become necessary for parallel-programming frameworks. This paper discusses an unobtrusive and annotation-based approach for managing different types of asynchronous GUI operations within the layout of familiar sequential code. The proposed solution minimizes the restructuring of sequential code, in order to simplify developing, testing and maintaining GUI-based applications. Furthermore, the paper presents an implementation of the concept for @PT, a parallel programming environment based on Java annotations. The evaluation discussed in this paper suggests that the proposed mechanism is valid, and demonstrates timely and efficient handling of asynchronous GUI operations.
This paper describes how a concept-based approach to teaching was used to update how concurrent and distributed systems were taught at the University of Copenhagen. This approach focuses on discussion to drive student...
详细信息
ISBN:
(纸本)9781728159768
This paper describes how a concept-based approach to teaching was used to update how concurrent and distributed systems were taught at the University of Copenhagen. This approach focuses on discussion to drive student engagement whilst fostering a deeper understanding of the presented topics compared to more traditional displays of crude facts. The course is split into three sections: local concurrency, networked concurrency, and concurrency in hardware. This allows for an easier student journey through the course, as they are introduced to all core concepts in the first section, then have them reinforced in greater detail in the subsequent sections. Finally, the experience gained in updating this course is presented so others attempting to do similar may learn from it.
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on parallel Computing Systems (PCS) has...
详细信息
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on parallel Computing Systems (PCS) has been designed to provide an understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. An activity based learning approach was adopted for teaching the course and several parallel programming paradigms and technologies such OpenMP, MPI, and CUDA have been covered. This course was offered as a required course to graduate students. This paper describes the implementation of the course at Thiagarajar College of Engineering. Evaluation of the implementation of the course reveals that for students who have not been exposed to parallel and distributed computing, i) activity based learning results in better knowledge gain compared to the traditional approach, ii) learning OpenMP was much easier than MPI or CUDA, iii) some parallel and Distributed Computing (PDC) concepts such as false sharing were harder to grasp compared to basic concepts, and iv) it is essential to introduce parallel computing in the undergraduate curriculum.
暂无评论