As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and s...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and solve complex problems more efficiently. However, for students to master this type of computation and be able to apply it in different contexts, it requires understanding how measuring and optimizing parallel code impacts its performance. This paper presents an approach to enhancing students' comprehension of parallel performance metrics through an interactive exercise that complements lectures on parallel performance and improves assessment.
The likelihood of unanticipated node failures in large-scale parallel computers increases with growing numbers of nodes. Furthermore, global reduction operations become major bottlenecks due to their limited parallel ...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
The likelihood of unanticipated node failures in large-scale parallel computers increases with growing numbers of nodes. Furthermore, global reduction operations become major bottlenecks due to their limited parallel scalability. The Preconditioned Conjugate Gradient (PCG) method faces these challenges.
PDC at UM, is a series of "codeless" modules consisting of visualizations, simulations, and demonstrations which introduce parallel and distributed Computing (PDC) concepts in early computing courses. These ...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
PDC at UM, is a series of "codeless" modules consisting of visualizations, simulations, and demonstrations which introduce parallel and distributed Computing (PDC) concepts in early computing courses. These materials are codeless because they do not require students to write or understand code. Instead, students read a short introduction to a PDC concept and Then engage with a web-based visualization and/or (code-based) demonstration reinforcing the concept. The codeless nature of these modules makes them suitable for computing and non computing majors. To test the effectiveness of our modules we introduced them into two CSI courses and designed and administered a pre/posttest. Our results show statistically significant results: those who engaged with our modules substantially improved their knowledge and understanding of PDC concepts. Our modules also improved student attitudes, confidence and self-efficacy with respect to PDC topics. We also provide some qualitative observations of our study and identify common misconceptions students have about PDC.
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scienti...
详细信息
ISBN:
(纸本)9798350311990
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.
The rapid growth of Internet of Things (IoT) and au-tonomous systems has led to the deployment of edge devices close to the sensing data source for low-latency computation.
ISBN:
(纸本)9781665497473
The rapid growth of Internet of Things (IoT) and au-tonomous systems has led to the deployment of edge devices close to the sensing data source for low-latency computation.
We present two new assignments in the Peachy parallel Assignments series of assignments for teaching parallel and distributed computing. Submitted assignments must have been successfully used previously and are select...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
We present two new assignments in the Peachy parallel Assignments series of assignments for teaching parallel and distributed computing. Submitted assignments must have been successfully used previously and are selected for being easy for other instructors to adopt and for being "cool and inspirational" so that students spend time on them and talk about them with others. The first assignment in this paper familiarizes students with the RAFT library for performing GPU-accelerated computation, pail of the RAPIDS AI ecosystem. Students use this library to accelerate a Radius Nearest Neighbor computation, finding all points within a given distance from a query point. In the second assignment, students parallelize a bird flocking simulation using OpenMP or OpenACC. It is a visual assignment which allows students to readily see the performance improvement.
The generalized Dryja-Smith-Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energ...
详细信息
ISBN:
(纸本)9798350337662
The generalized Dryja-Smith-Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy. The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2x using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.
The development of computer architecture towards multi-core processors brings new opportunities and challenges for efficient task scheduling in scientific workflows. This paper investigates the key challenges of sched...
详细信息
We show that the wavefront algorithm can achieve higher pairwise read alignment throughput on a UPMEM PIM system than on a server-grade multi-threaded CPU system.
ISBN:
(纸本)9781665497473
We show that the wavefront algorithm can achieve higher pairwise read alignment throughput on a UPMEM PIM system than on a server-grade multi-threaded CPU system.
We describe several features of parallel or distributed asynchronous iterative algorithms such as unbounded delays, possible out of order messages or flexible communication. We concentrate on the concept of macro-iter...
详细信息
ISBN:
(纸本)9781665497473
We describe several features of parallel or distributed asynchronous iterative algorithms such as unbounded delays, possible out of order messages or flexible communication. We concentrate on the concept of macro-iteration sequence which was introduced in order to study the convergence or termination of asynchronous iterations. A survey of asynchronous iterations for convex optimization problems is also presented. Finally, a new result of convergence for parallel or distributed asynchronous iterative algorithms with flexible communication for convex optimization problems and machine learning is proposed.
暂无评论