the proceedings contain 16 papers. the topics discussed include: outline of a thick control flow architecture;a dynamic load balance algorithm for the S4 parallel stream processing engine;a processor workload distribu...
ISBN:
(纸本)9781509048441
the proceedings contain 16 papers. the topics discussed include: outline of a thick control flow architecture;a dynamic load balance algorithm for the S4 parallel stream processing engine;a processor workload distribution algorithm for massively parallel applications;parallelism and scalability: a solution focused on the cloud computing processing service billing;task scheduling in sucuri dataflow library;synchronization-free automatic parallelization for arbitrarily nested affine loops;thread footprint analysis for the design of multithreaded applications and multicore systems;a hybrid parallel algorithm for the auction algorithm in multicore systems;and dataflow to hardware synthesis framework on FPGAs.
the proceedings contain 23 papers. the topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization st...
ISBN:
(纸本)9798350305487
the proceedings contain 23 papers. the topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization strategy for parallel workloads;exploiting the potential of flexible processing units;reverse time migration with lossy and lossless wavefield compression;performance tuning for GPU-embedded systems: machine-learning-based and analytical model-driven tuning methodologies;WCSim: a cloud computing simulator with support for bag of tasks workflows;performance modeling of MARE2DEM’s adaptive mesh refinement for makespan estimation;and comparing performance and portability between CUDA and SYCL for protein database search on NVIDIA, AMD, and Intel GPUs.
the proceedings contain 18 papers. the topics discussed include: exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;DASS: dynam...
ISBN:
(纸本)9798350381603
the proceedings contain 18 papers. the topics discussed include: exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;DASS: dynamic adaptive sub-target specialization;optimizing microservices performance and resource utilization through containerized grouping: an experimental study;assessing the performance of an architecture-aware optimization tool for neural networks;an exploratory study of deep learning for predicting computational tasks behavior in HPC systems;exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;and energy consumption analysis of instruction cache prefetching methods.
the proceedings contain 9 papers. the topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra ...
ISBN:
(纸本)9781665451574
the proceedings contain 9 papers. the topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra library for distributed architectures;implementing the broadcast operation in a distributed task-based runtime;homomorphic evaluation of large look-up tables for inference on human genome data in the cloud;towards a federated learning framework on a multi-cloud environment;energy-efficient online resource provisioning for cloud-edge platforms via multi-armed bandits;edge computing versus cloud computing: impact on retinal image pre-processing;and standalone data-center sizing combating the over-provisioning of the IT and electrical parts.
the proceedings contain 21 papers. the topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural...
ISBN:
(纸本)9798350356168
the proceedings contain 21 papers. the topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural networks;analyzing HPC monitoring data with a view towards efficient resource utilization;DYAD: locality-aware data management for accelerating deep learning training;IDS-DEEP: a strategy for selecting the best IDS for drones with heterogeneous embedded platforms;memory sandbox: a versatile tool for analyzing and optimizing HBM performance in FPGA;DeVAS: decoupled virtual address spaces;towards performance portability of an oil and gas application on heterogeneous architectures;and JANUS: a simple and efficient speculative defense using reinforcement learning.
the proceedings contain 34 papers. the topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree s...
ISBN:
(纸本)9781665451550
the proceedings contain 34 papers. the topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree search;exploring the effects of silent data corruption in distributed deep learning training;gem5-ndp: near-data processing architecture simulation from low level caches to dram;approximate memory with protected static allocation;dynamic set stealing to improve cache performance;avoiding unnecessary caching with history-based preemptive bypassing;memory-side acceleration and sparse compression for quantized packed convolutions;NUMA-aware dense matrix factorizations and inversion with look-ahead on multicore processors;and a predictive approach for dynamic replication of operators in distributed stream processing systems.
the proceedings contain 69 papers. the topics discussed include: mix and match: a novel FPGA-centric deep neural network quantization framework;BRIM: bistable resistively-coupled Ising machine;systematic approaches fo...
ISBN:
(纸本)9780738123370
the proceedings contain 69 papers. the topics discussed include: mix and match: a novel FPGA-centric deep neural network quantization framework;BRIM: bistable resistively-coupled Ising machine;systematic approaches for precise and approximate quantum state runtime assertion;DepGraph: a dependency-driven accelerator for efficient iterative graph processing;Chopin: scalable graphics rendering in multi-GPU systems via parallel image composition;new models for understanding and reasoning about speculative execution attacks;ultra-elastic CGRAs for irregular loop specialization;analyzing and leveraging decoupled L1 caches in GPUs;and operating liquid-cooled large-scale systems: long-term monitoring, reliability analysis, and efficiency measures.
We present a runtime reconfigurable Field Programmable Crossbar Array (FPCA) architecture designed to enable high-accuracy multiplication and division alongside highthrough-put ML operations. Our innovative in-memory...
详细信息
ISBN:
(数字)9798331509422
ISBN:
(纸本)9798331509439
We present a runtime reconfigurable Field Programmable Crossbar Array (FPCA) architecture designed to enable high-accuracy multiplication and division alongside highthrough-put ML operations. Our innovative in-memory compute-based multiplier and divider designs employing novel partial product reduction techniques support expandable multi-precision floating-point operations. Leveraging advanced interface and peripheral design techniques, our error-resilient FPCA achieves 0% error for in-memory digital operations, demonstrating remarkable robustness and reliability despite device and crossbar irregularities. Additionally, our architecture enhances neural network training throughput by 63.8 × and improves power efficiency by 5.18 ×compared to state-of-the-art memristor-based accelerators. this work paves the way for future developments in high-performance, error-resistant in-memory computing solutions.
暂无评论