The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerati...
ISBN:
(纸本)9789819628292
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies;Diagnosability of the Lexicographic Product of Paths and Complete Bipartite Graphs Under PMC Model;DTuner: A Construction-Based Optimization Method for Dynamic Tensor Operators Accelerating;Efficient Implementation of the LOBPCG Algorithm on a CPU-GPU Cluster;HP-CSF: An GPU Optimization Method for CP Decomposition of Incomplete Tensors;JediGAN: A Fully Decentralized Training of GAN with Adaptive Discriminator Averaging and Generator Selection;optimizing Vo-Viso: A Modified Methodology to parallelcomputing with Isolating Data in Memristor Arrays;parallel Computation of the Combination of Two Point Operations in Conic Curves Cryptosystem over GF(2n) Using Tile Self-assembly;parallel Construction of Independent Spanning Trees on 3-ary n-cube Networks;SpecInF: Exploiting Idle GPU Resources in distributed DL Training via Speculative Inference Filling;swDarknet: A Heterogeneous parallel Deep Learning Framework Suitable for SW26010 Pro Processor;VConv: Autotiling Convolution Algorithm Based on MLIR for Multi-core Vector accelerators;ACH-Code: An Efficient Erasure Code to Reduce Average Repair Cost in Cloud Storage Systems of Multiple Availability Zones;CMS: A Computility Resource Status Management and Storage Framework;fast Memory Disaggregation with SwiftSwap;HASLB: Huge Page Allocation Strategy Optimized for Load-Balance in parallelcomputing Programs;lightFinder: Finding Persistent Items with Small Memory;miDedup: A Restore-Friendly Deduplication Method on Docker Image Storage Systems;SPLR: A Selective Packet Loss Recovery for Improved RDMA Performance;a Cluster-Based Platoon Formation Scheme for Realistic Automated Vehicle Platooning;AnaNET: Anatomical Network fo
This article proposes a method for optimizing the routing and wire size of distributed photovoltaic access distribution networks using multiple genetic algorithms. This method can effectively integrate photovoltaic po...
详细信息
Matrix multiplication is crucial in scientific computing, but it demands substantial resources. We propose a framework for effectively utilizing heterogeneous GPUs to large matrix multiplication. By splitting matrices...
详细信息
The inherent computational complexity of validating and verifying concurrent systems implies a need to be able to exploit parallel and distributedcomputing architectures. We present a new distributed algorithm for st...
详细信息
ISBN:
(纸本)9781665401623
The inherent computational complexity of validating and verifying concurrent systems implies a need to be able to exploit parallel and distributedcomputing architectures. We present a new distributed algorithm for state space exploration of concurrent systems on computing clusters. Our algorithm relies on Remote Direct Memory Access (RDMA) for low-latency transfer of states between computing elements, and on state reconstruction trees for compact representation of states on the computing elements themselves. For the distribution of states between computing elements, we propose a concept of state stealing. We have implemented our proposed algorithm using the OpenSHMEM API for RDMA and experimentally evaluated it on the grid'5000 testbed with a set of benchmark models. The experimental results show that our algorithm scales well with the number of available computing elements, and that our state stealing mechanism generally provides a balanced workload distribution.
With the advent of Blockchain and the Internet of Things (IoT), the Smart grid is a rapidly growing technology in decentralized energy distribution and trading. However, this advancement came with some serious cyber s...
详细信息
MPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
MPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which exhibits different characteristics, such as message overhead, latency, and scalability, which can significantly impact overall system performance. Therefore, choosing a suitable algorithm for each collective operation is crucial to achieve optimal performance. In this paper, we present our experience with MPI collectives algorithm selection on a large-scale supercomputer and highlight the impact of network traffic and system workload as well as other previously-investigated parameters such as message size, communicator size, and network topology. Our analysis shows that network traffic and system workload can make the performance of MPI collectives highly variable and, accordingly, impact the algorithm selection strategy.
Due to their structure, metaheuristics such as parallel evolutionary algorithms (PEA) are well suited to be run on parallel and distributed infrastructure, e.g. supercomputers. However, there are still many issues tha...
详细信息
ISBN:
(纸本)9783031708183;9783031708190
Due to their structure, metaheuristics such as parallel evolutionary algorithms (PEA) are well suited to be run on parallel and distributed infrastructure, e.g. supercomputers. However, there are still many issues that are not well researched in this context, e.g. existence of delays in HPC-grade implementations of metaheuristics and how they affect the computation itself. The lack of this knowledge may expose the fact, that the power of supercomputers in this context may be not properly used. We want to focus our research on examining such white spots. In the paper we focus on giving the evidence for the existence of delays, showing the differences among them in different island topologies, try to explain their nature and prepare to propose dedicated migration operators considering these observations.
In order to solve the problem of optimizing distributed flexible resources, a distributed flexible resource optimization scheduling model based on regional energy self-balancing architecture is proposed. The regional ...
详细信息
The proceedings contain 13 papers. The topics discussed include: a study on the performance of distributed storage systems in edge computing environments;RESCAPE: a resource estimation system for microservices with gr...
ISBN:
(纸本)9798350387339
The proceedings contain 13 papers. The topics discussed include: a study on the performance of distributed storage systems in edge computing environments;RESCAPE: a resource estimation system for microservices with graph neural network and profile engine;PrometheusMigrate: efficient live migration of confidential virtual machine with software abstraction;the cost perspective of adopting large language model-as-a-service;DCSA: the deployment mechanism of chained serverless applications in JointCloud environment;parallel computation in dynamic fog computing networks: a multi-armed bandit learning-based decentralized matching approach;and IBRI: an IoT solution for building collapse risk identification in smart cities.
Modern materials science research problems present a challenge to data science and analytics as experiments generate Petabyte-scale spatiotemporal datasets that span a number of modalities and formats. Creating comput...
详细信息
ISBN:
(纸本)9798350383225
Modern materials science research problems present a challenge to data science and analytics as experiments generate Petabyte-scale spatiotemporal datasets that span a number of modalities and formats. Creating computing infrastructure and frameworks that support the scale and diversity of materials science data while remaining accessible for materials scientists to use is a non-trivial task. We have developed the Common Research Analytics and Data Lifecycle Environment (CRADLE) to solve the challenges of materials data science through a scalable research computing framework and cyber infrastructure that can (1) handle large-scale, heterogeneous datasets (2) provide a flexible toolbox for building machine learning pipelines that span from ingestion to model deployment (3) be accessible to research scientists with limited to extensive computational backgrounds and (4) utilize a myriad of low performance to high performance computer systems. CRADLE is a framework that integrates distributed systems like Hadoop and High-Performance computing (HPC) infrastructure to handle materials data at scale. This all enables the general materials data scientist to query Petabytes of data and train thousands of models in a parallel, distributed environment. We demonstrate three use cases for CRADLE to benchmark its capability to ingest and analyze spatiotemporal materials data at scale. These tasks span three data modalities: transforming 2.6 billion Photovoltaic time-series power measurements, training hundreds of deep learning models on Atomic Force Microscopy images, and ingesting 27 billion geospatial data points. CRADLE exemplifies an overarching framework that accelerates time to science, extends to other domains with similar challenges, and expands the horizon of data science and research.
暂无评论