the proceedings contain 12 papers. the special focus in this conference is on Job Scheduling Strategies for parallelprocessing. the topics include: Optimization of Execution Parameters of Moldable Ultrasoun...
ISBN:
(纸本)9783031226977
the proceedings contain 12 papers. the special focus in this conference is on Job Scheduling Strategies for parallelprocessing. the topics include: Optimization of Execution Parameters of Moldable Ultrasound Workflows Under Incomplete Performance Data;Scheduling of Elastic Message Passing Applications on HPC Systems;preface;on the Feasibility of Simulation-Driven Portfolio Scheduling for Cyberinfrastructure Runtime Systems;Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes;re-making the Movie-Making Machine;using Kubernetes in Academic Environment: Problems and Approaches;AI-Job Scheduling on Systems with Renewable Power Sources;Toward Building a Digital Twin of Job Scheduling and Power Management on an HPC System;encoding for Reinforcement Learning Driven Scheduling.
the proceedings contain 123 papers. the topics discussed include: challenges and opportunities in designing high-performance and scalable middleware for HPC and ai: past, present, and future;HTS: a threaded multilevel...
ISBN:
(纸本)9781665481069
the proceedings contain 123 papers. the topics discussed include: challenges and opportunities in designing high-performance and scalable middleware for HPC and ai: past, present, and future;HTS: a threaded multilevel sparse hybrid solver;a scalable adaptive-matrix SPMV for heterogeneous architectures;direct solution of larger coupled sparse/dense linear systems using low-rank compression on single-node multi-core machines in an industrial context;distributed-memory sparse kernels for machine learning;fam-graph: graph analytics on disaggregated memory;scalable multi-versioning ordered key-value stores with persistent memory support;in-memory indexed caching for distributed data processing;landau collision operator in the CUDA programming model applied to thermal quench plasmas;exploiting reduced precision for GPU-based time series mining;and MICCO: an enhanced multi-GPU scheduling framework for many-body correlation functions.
the proceedings contain 148 papers. the topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly effi...
ISBN:
(纸本)9781665497473
the proceedings contain 148 papers. the topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly efficient ALLTOALL and ALLTOALLV communication algorithms for GPU systems;implementing spatio-temporal graph convolutional networks on graphcore IPUs;the best of many worlds: scheduling machine learning inference on CPU-GPU integrated architectures;online learning RTL synthesis for automated design space exploration;machine learning aided hardware resource estimation for FPGA DNN implementations;optimal schedules for high-level programming environments on FPGAs with constraint programming;on how to push efficient medical semantic segmentation to the edge: the SENECA approach;and exploiting high-bandwidth memory for FPGA-acceleration of inference on sum-product networks.
the proceedings contain 117 papers. the topics discussed include: detection of a novel dual attack in named data networking;fair DMA scheduler for low-latency accelerator offloading;multi-attribute decision-making met...
ISBN:
(纸本)9781665464970
the proceedings contain 117 papers. the topics discussed include: detection of a novel dual attack in named data networking;fair DMA scheduler for low-latency accelerator offloading;multi-attribute decision-making method based on interval intuitionistic trapezoidal fuzzy number to determine the expert weight: note: sub-titles are not captured in Xplore and should not be used;binary-level directed symbolic execution through pattern learning;an efficient metric-based approach for static use-after-free detection;a graph convolution neural network based method for insider threat detection;maintenance worker scheduling for charging pile fault: a multi-agent RL approach;towards secure bilateral friend query with conjunctive policy matching in social networks;structure-noise-aware anchor link prediction across social networks;file system to support secure cloud-based sharing;discovering agent models using process mining: initial approach and a case study;and towards agent-based simulation of the parallel trading market of pharmaceuticals.
the proceedings contain 105 papers. the topics discussed include: a tale of two C’s: convergence and composability;DSXplore: optimizing convolutional neural networks via sliding-channel convolutions;an in-depth analy...
ISBN:
(纸本)9781665440660
the proceedings contain 105 papers. the topics discussed include: a tale of two C’s: convergence and composability;DSXplore: optimizing convolutional neural networks via sliding-channel convolutions;an in-depth analysis of distributed training of deep neural networks;scalable epidemiological workflows to support COVID-19 planning and response;AlphaR: learning-powered resource management for irregular, dynamic microservice graph;distributed-memory multi-GPU block-sparse tensor contraction for electronic structure;correlation-wise smoothing: lightweight knowledge extraction for HPC monitoring data;designing high-performance MPI libraries with on-the-fly compression for modern GPU clusters;Nowa: a wait-free continuation-stealing concurrency platform;noise-resilient empirical performance modeling with deep neural networks;and communication-avoiding and memory-constrained sparse matrix-matrix multiplication at extreme scale.
the proceedings contain 222 papers. the topics discussed include: DRL-deploy: adaptive service function chains deployment with deep reinforcement learning;accuracy vs. efficiency: achieving boththrough hardware-aware...
ISBN:
(纸本)9781665435741
the proceedings contain 222 papers. the topics discussed include: DRL-deploy: adaptive service function chains deployment with deep reinforcement learning;accuracy vs. efficiency: achieving boththrough hardware-aware quantization and reconfigurable architecture with mixed precision;cmss: collaborative modeling of safety and security requirements for network protocols;FGPA: fine-grained pipelined acceleration for depthwise separable CNN in resource constraint scenarios;Dyacon: JointCloud dynamic access control model of data security based on verifiable credentials;understanding the runtime overheads of deep learning inference on edge devices;and alleviating imbalance in synchronous distributed training of deep neural networks.
the proceedings contain 110 papers. the topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed mu...
ISBN:
(纸本)9781728168760
the proceedings contain 110 papers. the topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed multi-GPUs;DPF-ECC: accelerating elliptic curve cryptography with floating-point computing power of GPUs;inter-job scheduling of high-throughput material screening applications;learning an effective charging scheme for mobile devices;improving transactional code generation via variable annotation and barrier elision;solving the container explosion problem for distributed high throughput computing;CycLedger: a scalable and secure parallel protocol for distributed ledger via sharding;DAG-aware joint task scheduling and cache management in spark clusters;and understanding the interplay between hardware errors and user job characteristics on the Titan supercomputer.
the proceedings contain 145 papers. the topics discussed include: towards stability in the chapel language;the GraphIt universal graph framework: achieving high-performance across algorithms, graph types, and architec...
ISBN:
(纸本)9781728174457
the proceedings contain 145 papers. the topics discussed include: towards stability in the chapel language;the GraphIt universal graph framework: achieving high-performance across algorithms, graph types, and architectures;analyzing deep learning model inferences for image classification using OpenVINO;an automated machine learning approach for data locality optimizations in chapel;teaching modern multithreading in CS2 with actors;PHRYCTORIA: a messaging system for transprecision OpenCAPI-attached FPGA accelerators;machine learning-based prefetching for SCM main memory system;a microcode-based control unit for deep learning processors;and silent data access protocol for NVRAM+RDMA distributed storage.
the proceedings contain 88 papers. the topics discussed include: HINT: designing cache-efficient MPI_Alltoall using hybrid memory copy ordering and non-temporal instructions;graph analytics on jellyfish topology;QSync...
ISBN:
(纸本)9798350337662
the proceedings contain 88 papers. the topics discussed include: HINT: designing cache-efficient MPI_Alltoall using hybrid memory copy ordering and non-temporal instructions;graph analytics on jellyfish topology;QSync: quantization-minimized synchronous distributed training across hybrid devices;two-stage block orthogonalization to improve performance of s-step GMRES;CloverLeaf on intel multi-core CPUs: a case study in write-allocate evasion;the self-adaptive and topology-aware MPI Bcast leveraging collective offload on Tianhe express interconnect;Picasso: memory-efficient graph coloring using palettes with applications in quantum computing;exploiting long vectors with a CFD code: a co-design show case;and to store or not to store: a graph theoretical approach for dataset versioning.
A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling. Conducting DNN training with a combination of...
详细信息
ISBN:
(纸本)9798350387117;9798350387124
A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling. Conducting DNN training with a combination of heterogeneous training and inference GPUs, known as hybrid device training, presents considerable challenges due to disparities in compute capability and significant differences in memory capacity. We propose QSync, a training system that enables efficient synchronous data-parallel DNN training over hybrid devices by strategically exploiting quantized operators. According to each device's available resource capacity, QSync selects a quantization-minimized setting for operators in the distributed DNN training graph, minimizing model accuracy degradation but keeping the training efficiency brought by quantization. We carefully design a predictor with a bi-directional mixed-precision indicator to reflect the sensitivity of DNN layers on fixed-point and floating-point low-precision operators, a replayer with a neighborhood-aware cost mapper to accurately estimate the latency of distributed hybrid mixed-precision training, and then an allocator that efficiently synchronizes workers with minimized model accuracy degradation. QSync bridges the computational graph on PyTorch to an optimized backend for quantization kernel performance and flexible support for various GPU architectures. Extensive experiments show that QSync's predictor can accurately simulate distributed mixed-precision training with < 5% error, with a consistent 0.27 - 1.03% accuracy improvement over the from-scratch training tasks compared to uniform precision.
暂无评论