The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds bl...
ISBN:
(纸本)9798350364606
The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds blossom algorithm for maximum cardinality matching in general graphs;a reconfigurable architecture of a scalable, ultrafast, ultrasound, delay-and-sum beamformer;scheduling and allocation of disaggregated memory resources in HPC systems;GIM (ghost in the machine): a coarse-grained reconfigurable compute-in-memory platform for exploring machine-learning architectures;further optimizations and analysis of smith-waterman with vector extensions;measurement-based quantum approximate optimization;optimizing forward wavefield storage leveraging high-speed storage media;teaching performance metrics in parallel computing courses;and compiler-driven Swar parallelism for high-performance bitboard algorithms.
The proceedings contain 88 papers. The topics discussed include: HINT: designing cache-efficient MPI_Alltoall using hybrid memory copy ordering and non-temporal instructions;graph analytics on jellyfish topology;QSync...
ISBN:
(纸本)9798350337662
The proceedings contain 88 papers. The topics discussed include: HINT: designing cache-efficient MPI_Alltoall using hybrid memory copy ordering and non-temporal instructions;graph analytics on jellyfish topology;QSync: quantization-minimized synchronous distributed training across hybrid devices;two-stage block orthogonalization to improve performance of s-step GMRES;CloverLeaf on intel multi-core CPUs: a case study in write-allocate evasion;the self-adaptive and topology-aware MPI Bcast leveraging collective offload on Tianhe express interconnect;Picasso: memory-efficient graph coloring using palettes with applications in quantum computing;exploiting long vectors with a CFD code: a co-design show case;and to store or not to store: a graph theoretical approach for dataset versioning.
The proceedings contain 114 papers. The topics discussed include: a task based approach for co-scheduling ensemble workloads on heterogeneous nodes;power-aware computing with Optane persistent memory modules;cloud ser...
ISBN:
(纸本)9798350311990
The proceedings contain 114 papers. The topics discussed include: a task based approach for co-scheduling ensemble workloads on heterogeneous nodes;power-aware computing with Optane persistent memory modules;cloud services enable efficient ai-guided simulation workflows across heterogeneous resources;enabling efficient regular expression matching at the edge through domain-specific architectures;is your FPGA transmitting secrets: covert antennas from interconnect;hardware accelerator for transformer based end-to-end automatic speech recognition system;near-storage accelerator for bulk graph ingestion;application-specific FPGAs: cryptographic agility through customized reconfigurable architectures;parallel inference of phylogenetic stands with Gentrius;and using hyperdimensional computing to extract features for the detection of type 2 diabetes.
The proceedings contain 12 papers. The special focus in this conference is on Job Scheduling Strategies for parallelprocessing. The topics include: Optimization of Execution Parameters of Moldable Ultrasoun...
ISBN:
(纸本)9783031226977
The proceedings contain 12 papers. The special focus in this conference is on Job Scheduling Strategies for parallelprocessing. The topics include: Optimization of Execution Parameters of Moldable Ultrasound Workflows Under Incomplete Performance Data;Scheduling of Elastic Message Passing Applications on HPC Systems;preface;on the Feasibility of Simulation-Driven Portfolio Scheduling for Cyberinfrastructure Runtime Systems;Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes;re-making the Movie-Making Machine;using Kubernetes in Academic Environment: Problems and Approaches;AI-Job Scheduling on Systems with Renewable Power Sources;Toward Building a Digital Twin of Job Scheduling and Power Management on an HPC System;encoding for Reinforcement Learning Driven Scheduling.
The proceedings contain 95 papers. The topics discussed include: distributed sparse random projection trees for constructing K-nearest neighbor graphs;fast deterministic gathering with detection on arbitrary graphs: t...
ISBN:
(纸本)9798350337662
The proceedings contain 95 papers. The topics discussed include: distributed sparse random projection trees for constructing K-nearest neighbor graphs;fast deterministic gathering with detection on arbitrary graphs: the power of many robots;accurate and efficient distributed covid-19 spread prediction based on a large-scale time-varying people mobility graph;accelerating packet processing in container overlay networks via packet-level parallelism;efficient hardware primitives for immediate memory reclamation in optimistic data structures;efficient hardware primitives for immediate memory reclamation in optimistic data structures;accelerating distributed deep learning training with compression assisted Allgather and reduce-scatter communication;accelerating CNN inference on long vector architectures via co-design;exploiting input tensor dynamics in activation checkpointing for efficient training on GPU;drill: log-based anomaly detection for large-scale storage systems using source code analysis;dynasparse: accelerating GNN inference through dynamic sparsity exploitation;exploiting sparsity in pruned neural networks to optimize large model training;SRC: mitigate I/O throughput degradation in network congestion control of disaggregated storage systems;boosting multi-block repair in cloud storage systems with wide-stripe erasure coding;on doorway egress by autonomous robots;and on the arithmetic intensity of distributed-memory dense matrix multiplication involving a symmetric input matrix (SYMM).
The proceedings contain 123 papers. The topics discussed include: challenges and opportunities in designing high-performance and scalable middleware for HPC and ai: past, present, and future;HTS: a threaded multilevel...
ISBN:
(纸本)9781665481069
The proceedings contain 123 papers. The topics discussed include: challenges and opportunities in designing high-performance and scalable middleware for HPC and ai: past, present, and future;HTS: a threaded multilevel sparse hybrid solver;a scalable adaptive-matrix SPMV for heterogeneous architectures;direct solution of larger coupled sparse/dense linear systems using low-rank compression on single-node multi-core machines in an industrial context;distributed-memory sparse kernels for machine learning;fam-graph: graph analytics on disaggregated memory;scalable multi-versioning ordered key-value stores with persistent memory support;in-memory indexed caching for distributed data processing;landau collision operator in the CUDA programming model applied to thermal quench plasmas;exploiting reduced precision for GPU-based time series mining;and MICCO: an enhanced multi-GPU scheduling framework for many-body correlation functions.
The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly effi...
ISBN:
(纸本)9781665497473
The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly efficient ALLTOALL and ALLTOALLV communication algorithms for GPU systems;implementing spatio-temporal graph convolutional networks on graphcore IPUs;the best of many worlds: scheduling machine learning inference on CPU-GPU integrated architectures;online learning RTL synthesis for automated design space exploration;machine learning aided hardware resource estimation for FPGA DNN implementations;optimal schedules for high-level programming environments on FPGAs with constraint programming;on how to push efficient medical semantic segmentation to the edge: the SENECA approach;and exploiting high-bandwidth memory for FPGA-acceleration of inference on sum-product networks.
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machin...
ISBN:
(纸本)9781665435772
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machine learning;improving the operational capability of automated empirical performance modeling;development of a middleware to create an efficient unified programming model for heterogeneous computing;task-level checkpointing for nested fork-join programs;verifiable coded computing: towards fast and secure distributed computing;hierarchical cost analysis for distributed deep learning;pattern-aware vectorization for sparse matrix computations;and heterogeneity-aware deep learning workload deployments on the computing continuum.
The proceedings contain 105 papers. The topics discussed include: a tale of two C’s: convergence and composability;DSXplore: optimizing convolutional neural networks via sliding-channel convolutions;an in-depth analy...
ISBN:
(纸本)9781665440660
The proceedings contain 105 papers. The topics discussed include: a tale of two C’s: convergence and composability;DSXplore: optimizing convolutional neural networks via sliding-channel convolutions;an in-depth analysis of distributed training of deep neural networks;scalable epidemiological workflows to support COVID-19 planning and response;AlphaR: learning-powered resource management for irregular, dynamic microservice graph;distributed-memory multi-GPU block-sparse tensor contraction for electronic structure;correlation-wise smoothing: lightweight knowledge extraction for HPC monitoring data;designing high-performance MPI libraries with on-the-fly compression for modern GPU clusters;Nowa: a wait-free continuation-stealing concurrency platform;noise-resilient empirical performance modeling with deep neural networks;and communication-avoiding and memory-constrained sparse matrix-matrix multiplication at extreme scale.
The proceedings contain 110 papers. The topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed mu...
ISBN:
(纸本)9781728168760
The proceedings contain 110 papers. The topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed multi-GPUs;DPF-ECC: accelerating elliptic curve cryptography with floating-point computing power of GPUs;inter-job scheduling of high-throughput material screening applications;learning an effective charging scheme for mobile devices;improving transactional code generation via variable annotation and barrier elision;solving the container explosion problem for distributed high throughput computing;CycLedger: a scalable and secure parallel protocol for distributed ledger via sharding;DAG-aware joint task scheduling and cache management in spark clusters;and understanding the interplay between hardware errors and user job characteristics on the Titan supercomputer.
暂无评论