The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerati...
ISBN:
(纸本)9789819628636
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies;Diagnosability of the Lexicographic Product of Paths and Complete Bipartite Graphs Under PMC Model;DTuner: A Construction-Based Optimization Method for Dynamic Tensor Operators Accelerating;Efficient Implementation of the LOBPCG Algorithm on a CPU-GPU Cluster;HP-CSF: An GPU Optimization Method for CP Decomposition of Incomplete Tensors;JediGAN: A Fully Decentralized Training of GAN with Adaptive Discriminator Averaging and Generator Selection;optimizing Vo-Viso: A Modified Methodology to parallelcomputing with Isolating Data in Memristor Arrays;parallel Computation of the Combination of Two Point Operations in Conic Curves Cryptosystem over GF(2n) Using Tile Self-assembly;parallel Construction of Independent Spanning Trees on 3-ary n-cube Networks;SpecInF: Exploiting Idle GPU Resources in distributed DL Training via Speculative Inference Filling;swDarknet: A Heterogeneous parallel Deep Learning Framework Suitable for SW26010 Pro Processor;VConv: Autotiling Convolution Algorithm Based on MLIR for Multi-core Vector accelerators;ACH-Code: An Efficient Erasure Code to Reduce Average Repair Cost in Cloud Storage Systems of Multiple Availability Zones;CMS: A Computility Resource Status Management and Storage Framework;fast Memory Disaggregation with SwiftSwap;HASLB: Huge Page Allocation Strategy Optimized for Load-Balance in parallelcomputing Programs;lightFinder: Finding Persistent Items with Small Memory;miDedup: A Restore-Friendly Deduplication Method on Docker Image Storage Systems;SPLR: A Selective Packet Loss Recovery for Improved RDMA Performance;a Cluster-Based Platoon Formation Scheme for Realistic Automated Vehicle Platooning;AnaNET: Anatomical Network fo
Edge computing is considered a promising architecture for handling latency-sensitive and computationally intensive tasks. The lack of consideration for the timing of jobs and their unique topology in the existing rese...
详细信息
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerati...
ISBN:
(纸本)9789819628292
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies;Diagnosability of the Lexicographic Product of Paths and Complete Bipartite Graphs Under PMC Model;DTuner: A Construction-Based Optimization Method for Dynamic Tensor Operators Accelerating;Efficient Implementation of the LOBPCG Algorithm on a CPU-GPU Cluster;HP-CSF: An GPU Optimization Method for CP Decomposition of Incomplete Tensors;JediGAN: A Fully Decentralized Training of GAN with Adaptive Discriminator Averaging and Generator Selection;optimizing Vo-Viso: A Modified Methodology to parallelcomputing with Isolating Data in Memristor Arrays;parallel Computation of the Combination of Two Point Operations in Conic Curves Cryptosystem over GF(2n) Using Tile Self-assembly;parallel Construction of Independent Spanning Trees on 3-ary n-cube Networks;SpecInF: Exploiting Idle GPU Resources in distributed DL Training via Speculative Inference Filling;swDarknet: A Heterogeneous parallel Deep Learning Framework Suitable for SW26010 Pro Processor;VConv: Autotiling Convolution Algorithm Based on MLIR for Multi-core Vector accelerators;ACH-Code: An Efficient Erasure Code to Reduce Average Repair Cost in Cloud Storage Systems of Multiple Availability Zones;CMS: A Computility Resource Status Management and Storage Framework;fast Memory Disaggregation with SwiftSwap;HASLB: Huge Page Allocation Strategy Optimized for Load-Balance in parallelcomputing Programs;lightFinder: Finding Persistent Items with Small Memory;miDedup: A Restore-Friendly Deduplication Method on Docker Image Storage Systems;SPLR: A Selective Packet Loss Recovery for Improved RDMA Performance;a Cluster-Based Platoon Formation Scheme for Realistic Automated Vehicle Platooning;AnaNET: Anatomical Network fo
Quantum annealers like those from D-Wave Systems implement adiabatic quantum computing to solve optimization problems, but their analog nature and limited control functionalities present challenges to correcting or mi...
详细信息
ISBN:
(纸本)9798400705977
Quantum annealers like those from D-Wave Systems implement adiabatic quantum computing to solve optimization problems, but their analog nature and limited control functionalities present challenges to correcting or mitigating errors. As quantum computingadvances towards applications, effective error suppression is an important research goal. We propose a new approach called replication based mitigation (RBM) based on parallel quantum annealing. In RBM, physical qubits representing the same logical qubit are dispersed across different copies of the problem embedded in the hardware. This mitigates hardware biases, is compatible with limited qubit connectivity in current annealers, and is suited for available noisy intermediate-scale quantum (NISQ) annealers. Our experimental analysis shows that RBM provides solution quality on par with previous methods while being compatible with a much wider range of hardware connectivity patterns. In comparisons against standard quantum annealing without error mitigation, RBM consistently improves the energies and ground state probabilities across parameterized problem sets.
The proceedings contain 17 papers. The topics discussed include: portable implementations of work stealing;sKokkos: enabling Kokkos with transparent device selection on heterogeneous systems using OpenACC;parallelized...
ISBN:
(纸本)9798400708893
The proceedings contain 17 papers. The topics discussed include: portable implementations of work stealing;sKokkos: enabling Kokkos with transparent device selection on heterogeneous systems using OpenACC;parallelized remapping algorithms for km-scale global weather and climate simulations with icosahedral grid system;approximate block diagonalization of symmetric matrices using quantum annealing;QUBO formulation using inequalities for problems with complex constraints;low-latency communication in RISC-V clusters;flexible systolic array platform on virtual 2-D multi-FPGA plane;an efficient task-parallel pipeline programming framework;and task-based low-rank hybrid parallel Cholesky factorization for distributed memory environment.
Matrix multiplication is crucial in scientific computing, but it demands substantial resources. We propose a framework for effectively utilizing heterogeneous GPUs to large matrix multiplication. By splitting matrices...
详细信息
MPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
MPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which exhibits different characteristics, such as message overhead, latency, and scalability, which can significantly impact overall system performance. Therefore, choosing a suitable algorithm for each collective operation is crucial to achieve optimal performance. In this paper, we present our experience with MPI collectives algorithm selection on a large-scale supercomputer and highlight the impact of network traffic and system workload as well as other previously-investigated parameters such as message size, communicator size, and network topology. Our analysis shows that network traffic and system workload can make the performance of MPI collectives highly variable and, accordingly, impact the algorithm selection strategy.
Authentication, authorization, and access control are fundamental functionalities that are crucial for network infrastructures and software applications. These functionalities work together to create a fundamental sec...
详细信息
ISBN:
(纸本)9798331529109;9798331529093
Authentication, authorization, and access control are fundamental functionalities that are crucial for network infrastructures and software applications. These functionalities work together to create a fundamental security layer that allows administrative entities to control user actions. Implementing a security layer may be simple for basic applications, but as modern digital infrastructures become more complex, more advanced security systems are needed. Traditional perimeter-based security models, long relied upon for securing large networks, exhibit vulnerabilities and lack adaptability to modern architectures. As technology advances, there is a growing demand for new authentication and authorization systems to keep up with the changes. Zero Trust (ZT) emerges as a paradigm embodying such principles and concepts for constructing contemporary security systems. This paper introduces a ZT-based Single Sign-On (SSO) framework to demonstrate how ZT can be realized in multi-service environments using Attribute-Based Access Control (ABAC). A prototype is developed to show the feasibility and applicability of the proposed framework in a smart city context.
Due to their structure, metaheuristics such as parallel evolutionary algorithms (PEA) are well suited to be run on parallel and distributed infrastructure, e.g. supercomputers. However, there are still many issues tha...
详细信息
ISBN:
(纸本)9783031708183;9783031708190
Due to their structure, metaheuristics such as parallel evolutionary algorithms (PEA) are well suited to be run on parallel and distributed infrastructure, e.g. supercomputers. However, there are still many issues that are not well researched in this context, e.g. existence of delays in HPC-grade implementations of metaheuristics and how they affect the computation itself. The lack of this knowledge may expose the fact, that the power of supercomputers in this context may be not properly used. We want to focus our research on examining such white spots. In the paper we focus on giving the evidence for the existence of delays, showing the differences among them in different island topologies, try to explain their nature and prepare to propose dedicated migration operators considering these observations.
Modern materials science research problems present a challenge to data science and analytics as experiments generate Petabyte-scale spatiotemporal datasets that span a number of modalities and formats. Creating comput...
详细信息
ISBN:
(纸本)9798350383225
Modern materials science research problems present a challenge to data science and analytics as experiments generate Petabyte-scale spatiotemporal datasets that span a number of modalities and formats. Creating computing infrastructure and frameworks that support the scale and diversity of materials science data while remaining accessible for materials scientists to use is a non-trivial task. We have developed the Common Research Analytics and Data Lifecycle Environment (CRADLE) to solve the challenges of materials data science through a scalable research computing framework and cyber infrastructure that can (1) handle large-scale, heterogeneous datasets (2) provide a flexible toolbox for building machine learning pipelines that span from ingestion to model deployment (3) be accessible to research scientists with limited to extensive computational backgrounds and (4) utilize a myriad of low performance to high performance computer systems. CRADLE is a framework that integrates distributed systems like Hadoop and High-Performance computing (HPC) infrastructure to handle materials data at scale. This all enables the general materials data scientist to query Petabytes of data and train thousands of models in a parallel, distributed environment. We demonstrate three use cases for CRADLE to benchmark its capability to ingest and analyze spatiotemporal materials data at scale. These tasks span three data modalities: transforming 2.6 billion Photovoltaic time-series power measurements, training hundreds of deep learning models on Atomic Force Microscopy images, and ingesting 27 billion geospatial data points. CRADLE exemplifies an overarching framework that accelerates time to science, extends to other domains with similar challenges, and expands the horizon of data science and research.
暂无评论