Maude is a high-performance logical framework based on rewriting logic and supporting formal specification, verification and declarative programming of concurrent systems. Since most concurrent open systems are made u...
详细信息
ISBN:
(纸本)9798400709692
Maude is a high-performance logical framework based on rewriting logic and supporting formal specification, verification and declarative programming of concurrent systems. Since most concurrent open systems are made up of actor-like objects that communicate with each other through message passing, Maude provides special features to support their specification, verification and programming. Since open systems are heterogeneous, involving widely different kinds of objects such as sensors, actuators, devices, databases, graphical user interfaces, and so on, Maude supports declarative message-passing interaction between Maude objects and a wide variety of heterogeneous external objects. In this paper we explain and illustrate a methodology where an open system can first be designed and verified in Maude and then implemented as a distributed system of heterogeneous objects in a way that seamlessly bridges the gap between its formal specification and verification and its distributed implementation.
The proceedings contain 12 papers. The special focus in this conference is on Job Scheduling Strategies for parallel Processing. The topics include: Optimization of Execution Parameters of Moldable Ultrasoun...
ISBN:
(纸本)9783031226977
The proceedings contain 12 papers. The special focus in this conference is on Job Scheduling Strategies for parallel Processing. The topics include: Optimization of Execution Parameters of Moldable Ultrasound Workflows Under Incomplete Performance Data;Scheduling of Elastic Message Passing Applications on HPC systems;preface;on the Feasibility of Simulation-Driven Portfolio Scheduling for Cyberinfrastructure Runtime systems;Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes;re-making the Movie-Making Machine;using Kubernetes in Academic Environment: Problems and Approaches;AI-Job Scheduling on systems with Renewable Power Sources;Toward Building a Digital Twin of Job Scheduling and Power Management on an HPC System;encoding for Reinforcement Learning Driven Scheduling.
distributed shared memory (DSM) systems can handle data-intensive applications and recently receiving more attention. A majority of existing DSM implementations are based on write-invalidation (WI) protocols, which ac...
详细信息
ISBN:
(纸本)9781665481069
distributed shared memory (DSM) systems can handle data-intensive applications and recently receiving more attention. A majority of existing DSM implementations are based on write-invalidation (WI) protocols, which achieve sub-optimal performance when the cache size is small. Specifically, the vast majority of invalidation messages become useless when evictions are frequent. The problem is troublesome regarding scarce memory resources in data centers. To this end, we propose a self-invalidation protocol Falcon to eliminate invalidation messages. It relies on per-operation timestamps to achieve the global memory order required by sequential consistency (SC). Furthermore, we conduct a comprehensive discussion on the two protocols with an emphasis on the cache size impact. We also implement both protocols atop a recent DSM system, Grappa. The evaluation shows that the optimal protocol can improve the performance of a KV database by 27% and a graph processing application by 71.4% against the vanilla cache-free scheme.
One of the main goals of clinical studies consists of identifying diseases' causes and improving the efficacy of medical treatments. Sometimes, the reduced number of participants is a limiting factor for these stu...
详细信息
ISBN:
(数字)9781665467704
ISBN:
(纸本)9781665467704
One of the main goals of clinical studies consists of identifying diseases' causes and improving the efficacy of medical treatments. Sometimes, the reduced number of participants is a limiting factor for these studies, leading researchers to organise multi-centre studies. However, sharing health data raises certain concerns regarding patients' privacy, namely related to the robustness of anonymisation procedures. Although these techniques remove personal identifiers from registries, some studies have shown that anonymisation procedures can sometimes be reverted using specific patients' characteristics. In this paper, we propose a secure architecture to explore distributeddatabases without compromising the patient's privacy. The proposed architecture is based on interoperable repositories supported by a common data model.
The ubiquity of multicore processors, cloud computing, and hardware accelerators have elevated parallel and distributed computing (PDC) topics into fundamental building blocks of the undergraduate CS curriculum. There...
详细信息
ISBN:
(纸本)9781665497473
The ubiquity of multicore processors, cloud computing, and hardware accelerators have elevated parallel and distributed computing (PDC) topics into fundamental building blocks of the undergraduate CS curriculum. Therefore, it is increasingly important for students to learn a common core of introductory PDC topics and develop parallel thinking skills early in their CS studies. We present the curricular design, pedagogy, and goals of an introductory-level course on computer systems that introduces parallel computing to students who have only a CS1 background. Our course focuses on three curricular goals that serve to integrate the ACM-IEEE TCPP guidelines throughout: a vertical slice through the computer of how it runs a program;evaluating system costs associated with running a program;and taking advantage of the power of parallel computing. We elaborate on the goals and details of our course's key modules, and we discuss our pedagogical approach that includes active-learning techniques. We find that the PDC foundation gained through early exposure in this course helps students gain confidence in their ability to expand and apply their understanding of PDC concepts throughout their CS education.
A performance-portable application can run on a variety of different hardware platforms, achieving an acceptable level of performance without requiring significant rewriting for each platform. Several performance-port...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
A performance-portable application can run on a variety of different hardware platforms, achieving an acceptable level of performance without requiring significant rewriting for each platform. Several performance-portable programming models are now suitable for high-performance scientific application development, including OpenMP and Kokkos. Chapel is a parallel programming language that supports the productive development of high-performance scientific applications and has recently added support for GPU architectures through native code generation. Using three mini-apps BabelStream, miniBUDE, and TeaLeaf we evaluate the Chapel language's performance portability across various CPU and GPU platforms. In our evaluation, we replicate and build on previous studies of performance portability using mini-apps, comparing Chapel against OpenMP, Kokkos, and the vendor programming models CUDA and HIP. We find that Chapel achieves comparable performance portability to OpenMP and Kokkos and identify several implementation issues that limit Chapel's performance portability on certain platforms.
Graph neural networks (GNNs) operate on data represented as graphs, and are useful for a wide variety of tasks from chemical reaction and protein structure prediction to content recommendation systems. However, traini...
详细信息
ISBN:
(纸本)9781665497473
Graph neural networks (GNNs) operate on data represented as graphs, and are useful for a wide variety of tasks from chemical reaction and protein structure prediction to content recommendation systems. However, training for large graphs and improving training performance remain significant challenges. Existing distributed training systems partition a graph among all compute nodes to train for large graphs;however, this results in a communication overhead to degrade training performance. In this study, to solve these two problems, we propose a scalable data-paralleldistributed GNN training system designed to partition a graph redundantly. It is implemented using remote direct memory access (RDMA) and nonblocking active messages to efficiently utilize network performance and hide communication overhead by overlapping with the training computation. Experimental results are presented to show the strong scalability of the proposed approach, which achieved parallel efficiencies of 0.93 using eight compute nodes for the ogbn-products dataset in the Open Graph Benchmark (OGB) and 0.95 based on two compute nodes using 32 compute nodes for the ogbn-papersl00M dataset. The proposed system exhibited a training performance 18.9% better than the state-of-the-art DistDGL, even with only a single compute node. The results demonstrate that the proposed approach may be considered a promising method to achieve scalable training performance for large graphs.
In recent years, key-value stores (KV stores) [1]-[3] begin to gain popularity as storage engines for large-scale data applications. KV stores are fundamentally different from traditional SQL databases and with the ke...
详细信息
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
In recent years, key-value stores (KV stores) [1]-[3] begin to gain popularity as storage engines for large-scale data applications. KV stores are fundamentally different from traditional SQL databases and with the key-value data model, they have various advantages, such as ease of use, flexibility and higher performance. However, some essential features in SQL databases, most notably atomicity, consistency, isolation and durability (ACID) transaction processing [4], are considered impractical and thus are generally not included in a KV store design. With the recent advancement, these features begin to be integrated into KV stores, giving birth to a brand new class of database management systems called NewSQL databases [5], [6]. The rationale behind NewSQL databases is that with ACID transaction support, KV stores can serve as storage engines for an upper-layer SQL query processor to handle SQL queries [7]-[9]. In this way, NewSQL databases can provide both the scalability of KV stores and the ACID guarantees required for online transaction processing (OLTP), thus providing high performance for different types of workloads in a large-scale data system.
Multiple power modules are interfaced in parallel via laminated busbar in high-capacity active neutral-point clamped (ANPC) converters. Consequently, the complex mutual coupling among parallel branches profoundly affe...
详细信息
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Bec...
详细信息
ISBN:
(纸本)9781665481069
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, due to rapidly growing data set sizes this approach has become increasingly infeasible. Surprisingly, the questions of why and to what extent random access is required have not received a lot of attention in the literature from an empirical standpoint. In this paper, we revisit data shuffling in DL workloads to investigate the viability of partitioning the dataset among workers and performing only a partial distributed exchange of samples in each training epoch. Through extensive experiments on up to 2,048 GPUs of ABCI and 4,096 compute nodes of Fugaku, we demonstrate that in practice validation accuracy of global shuffling can be maintained when carefully tuning the partial distributed exchange. We provide a solution implemented in PyTorch that enables users to control the proposed data exchange scheme.
暂无评论