The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds bl...
ISBN:
(纸本)9798350364606
The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds blossom algorithm for maximum cardinality matching in general graphs;a reconfigurable architecture of a scalable, ultrafast, ultrasound, delay-and-sum beamformer;scheduling and allocation of disaggregated memory resources in HPC systems;GIM (ghost in the machine): a coarse-grained reconfigurable compute-in-memory platform for exploring machine-learning architectures;further optimizations and analysis of smith-waterman with vector extensions;measurement-based quantum approximate optimization;optimizing forward wavefield storage leveraging high-speed storage media;teaching performance metrics in parallel computing courses;and compiler-driven Swar parallelism for high-performance bitboard algorithms.
The proceedings contain 114 papers. The topics discussed include: a task based approach for co-scheduling ensemble workloads on heterogeneous nodes;power-aware computing with Optane persistent memory modules;cloud ser...
ISBN:
(纸本)9798350311990
The proceedings contain 114 papers. The topics discussed include: a task based approach for co-scheduling ensemble workloads on heterogeneous nodes;power-aware computing with Optane persistent memory modules;cloud services enable efficient ai-guided simulation workflows across heterogeneous resources;enabling efficient regular expression matching at the edge through domain-specific architectures;is your FPGA transmitting secrets: covert antennas from interconnect;hardware accelerator for transformer based end-to-end automatic speech recognition system;near-storage accelerator for bulk graph ingestion;application-specific FPGAs: cryptographic agility through customized reconfigurable architectures;parallel inference of phylogenetic stands with Gentrius;and using hyperdimensional computing to extract features for the detection of type 2 diabetes.
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machin...
ISBN:
(纸本)9781665435772
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machine learning;improving the operational capability of automated empirical performance modeling;development of a middleware to create an efficient unified programming model for heterogeneous computing;task-level checkpointing for nested fork-join programs;verifiable coded computing: towards fast and secure distributed computing;hierarchical cost analysis for distributed deep learning;pattern-aware vectorization for sparse matrix computations;and heterogeneity-aware deep learning workload deployments on the computing continuum.
The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly effi...
ISBN:
(纸本)9781665497473
The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly efficient ALLTOALL and ALLTOALLV communication algorithms for GPU systems;implementing spatio-temporal graph convolutional networks on graphcore IPUs;the best of many worlds: scheduling machine learning inference on CPU-GPU integrated architectures;online learning RTL synthesis for automated design space exploration;machine learning aided hardware resource estimation for FPGA DNN implementations;optimal schedules for high-level programming environments on FPGAs with constraint programming;on how to push efficient medical semantic segmentation to the edge: the SENECA approach;and exploiting high-bandwidth memory for FPGA-acceleration of inference on sum-product networks.
The proceedings contain 145 papers. The topics discussed include: towards stability in the chapel language;the GraphIt universal graph framework: achieving high-performance across algorithms, graph types, and architec...
ISBN:
(纸本)9781728174457
The proceedings contain 145 papers. The topics discussed include: towards stability in the chapel language;the GraphIt universal graph framework: achieving high-performance across algorithms, graph types, and architectures;analyzing deep learning model inferences for image classification using OpenVINO;an automated machine learning approach for data locality optimizations in chapel;teaching modern multithreading in CS2 with actors;PHRYCTORIA: a messaging system for transprecision OpenCAPI-attached FPGA accelerators;machine learning-based prefetching for SCM main memory system;a microcode-based control unit for deep learning processors;and silent data access protocol for NVRAM+RDMA distributed storage.
The proceedings contain 112 papers. The topics discussed include: an accurate tool for modeling, fingerprinting, comparison, and clustering of parallel applications based on performance counters;SmarTmem: intelligent ...
ISBN:
(纸本)9781728135106
The proceedings contain 112 papers. The topics discussed include: an accurate tool for modeling, fingerprinting, comparison, and clustering of parallel applications based on performance counters;SmarTmem: intelligent management of transcendent memory in a virtualized server;data reliability and redundancy optimization of a secure multi-cloud storage under uncertainty of errors and falsifications;a portable GPU framework for SNP comparisons;towards a methodology for benchmarking edge processing frameworks;a fast local algorithm for track reconstruction on parallel architectures;towards native execution of deep learning on a leadership-class hpc system;improving robustness of heterogeneous serverless computing systems via probabilistic task pruning;and influence of tasks duration variability on task-based runtime schedulers.
The proceedings contain 145 papers. The topics discussed include: user-transparent translation of machine instructions to programmable hardware;approximation algorithm for scheduling applications on hybrid multi-core ...
ISBN:
(纸本)9781538655559
The proceedings contain 145 papers. The topics discussed include: user-transparent translation of machine instructions to programmable hardware;approximation algorithm for scheduling applications on hybrid multi-core machines with communications delays;large scale data centers simulation based on baseline test model;application performance on a cluster-booster system;transport-triggered soft cores;robustness of surface EMG classifiers with fixed-point decomposition on reconfigurable architecture;streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform;high-level reliability evaluation of reconfiguration-based fault tolerance techniques;dynamic reconfiguration for real-time automotive embedded systems in fail-operational context;and rerooting trees increases opportunities for concurrent computation and results in markedly improved performance for phylogenetic inference.
The proceedings contain 317 papers. The topics discussed include: HSDP: accelerating large-scale model training via efficient sharded data parallelism;multilevel load balancing algorithm for domestic heterogeneous man...
ISBN:
(纸本)9798331509712
The proceedings contain 317 papers. The topics discussed include: HSDP: accelerating large-scale model training via efficient sharded data parallelism;multilevel load balancing algorithm for domestic heterogeneous manycore architecture;fast distributed polynomial multiplication algorithm for lattice-based cryptographic decryption in blockchain systems;carbon-aware distributed energy management for data center microgrids based on blockchain;distributed energy management for carbon neutral data centers;MTRP: a high-performance cost-efficient buffering scheme for multi-tenant cloud services;a diffusion-mamba approach with boundary sampling for imbalanced classification;a two-phase encrypted traffic classification scheme in programmable data plane;and a sorted and dynamic graph storage system of the hybrid memory architecture.
The proceedings contain 88 papers. The topics discussed include: HINT: designing cache-efficient MPI_Alltoall using hybrid memory copy ordering and non-temporal instructions;graph analytics on jellyfish topology;QSync...
ISBN:
(纸本)9798350337662
The proceedings contain 88 papers. The topics discussed include: HINT: designing cache-efficient MPI_Alltoall using hybrid memory copy ordering and non-temporal instructions;graph analytics on jellyfish topology;QSync: quantization-minimized synchronous distributed training across hybrid devices;two-stage block orthogonalization to improve performance of s-step GMRES;CloverLeaf on intel multi-core CPUs: a case study in write-allocate evasion;the self-adaptive and topology-aware MPI Bcast leveraging collective offload on Tianhe express interconnect;Picasso: memory-efficient graph coloring using palettes with applications in quantum computing;exploiting long vectors with a CFD code: a co-design show case;and to store or not to store: a graph theoretical approach for dataset versioning.
The proceedings contain 95 papers. The topics discussed include: distributed sparse random projection trees for constructing K-nearest neighbor graphs;fast deterministic gathering with detection on arbitrary graphs: t...
ISBN:
(纸本)9798350337662
The proceedings contain 95 papers. The topics discussed include: distributed sparse random projection trees for constructing K-nearest neighbor graphs;fast deterministic gathering with detection on arbitrary graphs: the power of many robots;accurate and efficient distributed covid-19 spread prediction based on a large-scale time-varying people mobility graph;accelerating packet processing in container overlay networks via packet-level parallelism;efficient hardware primitives for immediate memory reclamation in optimistic data structures;efficient hardware primitives for immediate memory reclamation in optimistic data structures;accelerating distributed deep learning training with compression assisted Allgather and reduce-scatter communication;accelerating CNN inference on long vector architectures via co-design;exploiting input tensor dynamics in activation checkpointing for efficient training on GPU;drill: log-based anomaly detection for large-scale storage systems using source code analysis;dynasparse: accelerating GNN inference through dynamic sparsity exploitation;exploiting sparsity in pruned neural networks to optimize large model training;SRC: mitigate I/O throughput degradation in network congestion control of disaggregated storage systems;boosting multi-block repair in cloud storage systems with wide-stripe erasure coding;on doorway egress by autonomous robots;and on the arithmetic intensity of distributed-memory dense matrix multiplication involving a symmetric input matrix (SYMM).
暂无评论