Edge computing is crucial for IoT applications, especially those needing quick, private data handling. However, these applications are resource-intensive, and edge computing resources are limited compared to cloud cap...
详细信息
Since the introduction of Metropolis Monte Carlo (MC) sampling, it and its variants have become standard tools used for thermodynamics evaluations of physical systems. However, a long-standing problem that hinders the...
详细信息
ISBN:
(纸本)9798350337662
Since the introduction of Metropolis Monte Carlo (MC) sampling, it and its variants have become standard tools used for thermodynamics evaluations of physical systems. However, a long-standing problem that hinders the effectiveness and efficiency of MC sampling is the lack of a generic method (a.k.a. MC proposal) to update the system configurations. Consequently, current practices are not scalable. Here we propose a parallel MC sampling framework for thermodynamics evaluation-DeepThermo. By using deep learning-based MC proposals that can globally update the system configurations, we show that DeepThermo can effectively evaluate the phase transition behaviors of high entropy alloys, which have an astronomical configuration space. For the first time, we directly evaluate a density of states expanding over a range of similar to e(10,000) for a real material. We also demonstrate DeepThermo's performance and scalability up to 3,000 GPUs on both NVIDIA V100 and AMD MI250X-based supercomputers.
Online GNN inference has been widely explored by applications such as online recommendation and financial fraud detection systems, where even minor delays can result in significant financial impact. Real-time dynamic ...
详细信息
ISBN:
(纸本)9798400714436
Online GNN inference has been widely explored by applications such as online recommendation and financial fraud detection systems, where even minor delays can result in significant financial impact. Real-time dynamic graph sampling enables online GNN inference to reflect the latest graph updates in real-world graphs. However, online GNN inference typically demands millisecond-level latency Service Level Objectives (SLOs) as its performance guarantees, which poses great challenges for existing dynamic graph sampling approaches based on graph databases. The issues mainly arise from two aspects: long tail latency due to imbalanced data-dependent sampling and large communication overhead incurred by distributed sampling. To address these issues, we propose Helios, an efficient distributed dynamic graph sampling service to meet the stringent latency SLOs. The key ideas of Helios are 1) pre-sampling the dynamic graph in an event-driven approach, and 2) maintaining a query-aware sample cache to build the complete K-hop sampling results locally for inference requests. Experiments on multiple datasets show that Helios achieves up to 67x higher serving throughput and up to 32x lower P99 query latency compared to baselines.
The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds bl...
ISBN:
(纸本)9798350364606
The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds blossom algorithm for maximum cardinality matching in general graphs;a reconfigurable architecture of a scalable, ultrafast, ultrasound, delay-and-sum beamformer;scheduling and allocation of disaggregated memory resources in HPC systems;GIM (ghost in the machine): a coarse-grained reconfigurable compute-in-memory platform for exploring machine-learning architectures;further optimizations and analysis of smith-waterman with vector extensions;measurement-based quantum approximate optimization;optimizing forward wavefield storage leveraging high-speed storage media;teaching performance metrics in parallel computing courses;and compiler-driven Swar parallelism for high-performance bitboard algorithms.
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and s...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and solve complex problems more efficiently. However, for students to master this type of computation and be able to apply it in different contexts, it requires understanding how measuring and optimizing parallel code impacts its performance. This paper presents an approach to enhancing students' comprehension of parallel performance metrics through an interactive exercise that complements lectures on parallel performance and improves assessment.
The likelihood of unanticipated node failures in large-scale parallel computers increases with growing numbers of nodes. Furthermore, global reduction operations become major bottlenecks due to their limited parallel ...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
The likelihood of unanticipated node failures in large-scale parallel computers increases with growing numbers of nodes. Furthermore, global reduction operations become major bottlenecks due to their limited parallel scalability. The Preconditioned Conjugate Gradient (PCG) method faces these challenges.
PDC at UM, is a series of "codeless" modules consisting of visualizations, simulations, and demonstrations which introduce parallel and distributed Computing (PDC) concepts in early computing courses. These ...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
PDC at UM, is a series of "codeless" modules consisting of visualizations, simulations, and demonstrations which introduce parallel and distributed Computing (PDC) concepts in early computing courses. These materials are codeless because they do not require students to write or understand code. Instead, students read a short introduction to a PDC concept and Then engage with a web-based visualization and/or (code-based) demonstration reinforcing the concept. The codeless nature of these modules makes them suitable for computing and non computing majors. To test the effectiveness of our modules we introduced them into two CSI courses and designed and administered a pre/posttest. Our results show statistically significant results: those who engaged with our modules substantially improved their knowledge and understanding of PDC concepts. Our modules also improved student attitudes, confidence and self-efficacy with respect to PDC topics. We also provide some qualitative observations of our study and identify common misconceptions students have about PDC.
We present two new assignments in the Peachy parallel Assignments series of assignments for teaching parallel and distributed computing. Submitted assignments must have been successfully used previously and are select...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
We present two new assignments in the Peachy parallel Assignments series of assignments for teaching parallel and distributed computing. Submitted assignments must have been successfully used previously and are selected for being easy for other instructors to adopt and for being "cool and inspirational" so that students spend time on them and talk about them with others. The first assignment in this paper familiarizes students with the RAFT library for performing GPU-accelerated computation, pail of the RAPIDS AI ecosystem. Students use this library to accelerate a Radius Nearest Neighbor computation, finding all points within a given distance from a query point. In the second assignment, students parallelize a bird flocking simulation using OpenMP or OpenACC. It is a visual assignment which allows students to readily see the performance improvement.
As the burst buffer is being widely deployed in the HPC (High-Performance Computing) systems, the distributed file system layer is taking the role of campaign storage where scalability and cost-effectiveness are of pa...
详细信息
ISBN:
(纸本)9798350337662
As the burst buffer is being widely deployed in the HPC (High-Performance Computing) systems, the distributed file system layer is taking the role of campaign storage where scalability and cost-effectiveness are of paramount importance. However, the centralized metadata management in the distributed file system layer poses a scalability challenge. The object storage system has emerged as an alternative thanks to its simplified interface and scale-out architecture. Despite this, the HPC communities are used to working with the POSIX interface to organize their files into a global directory hierarchy and control access through access control lists. In this paper, we present ArkFS, a near-POSIX compliant and scalable distributed file system implemented on top of the object storage system. ArkFS achieves high scalability without any centralized metadata servers. Instead, ArkFS lets each client manage a portion of the file system metadata on a perdirectory basis. ArkFS supports any distributed object storage system such as Ceph RADOS or S3-compatible system with an appropriate API translation module. Our experimental results indicate that ArkFS shows significant performance improvement under metadata-intensive workloads while showing near-linear scalability. We also demonstrate that ArkFS is suitable for handling the bursty I/O traffic coming from the burst buffer layer to archive cold data.
Genomic data leaks are irreversible. Leaked DNA cannot be changed, stays disclosed indefinitely, and affects the owner's family members as well. The recent large-scale genomic data collections [1], [2] render the ...
详细信息
ISBN:
(纸本)9781665497473
Genomic data leaks are irreversible. Leaked DNA cannot be changed, stays disclosed indefinitely, and affects the owner's family members as well. The recent large-scale genomic data collections [1], [2] render the traditional privacy protection mechanisms, like the Health Insurance Portability and Accountability Act (HIPAA), inadequate for protection against the novel security attacks [3]. On the other hand, data access restrictions hinder important clinical research that requires large datasets to operate [4]. These concerns can be naturally addressed by the employment of privacy-enhancing technologies, such as a secure multiparty computation (MPC) [5]–[10]. Secure MPC enables computation on data without disclosing the data itself by dividing the data and computation between multiple computing parties in a distributed manner to prevent individual computing parties from accessing raw data. MPC systems are being increasingly adopted in fields that operate on sensitive datasets [11]–[13], such as computational genomics and biomedical research [14]–[22].
暂无评论