The proceedings contain 16 papers. The topics discussed include: outline of a thick control flow architecture;a dynamic load balance algorithm for the S4 parallel stream processing engine;a processor workload distribu...
ISBN:
(纸本)9781509048441
The proceedings contain 16 papers. The topics discussed include: outline of a thick control flow architecture;a dynamic load balance algorithm for the S4 parallel stream processing engine;a processor workload distribution algorithm for massively parallel applications;parallelism and scalability: a solution focused on the cloud computing processing service billing;task scheduling in sucuri dataflow library;synchronization-free automatic parallelization for arbitrarily nested affine loops;thread footprint analysis for the design of multithreaded applications and multicore systems;a hybrid parallel algorithm for the auction algorithm in multicore systems;and dataflow to hardware synthesis framework on FPGAs.
The proceedings contain 23 papers. The topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization st...
ISBN:
(纸本)9798350305487
The proceedings contain 23 papers. The topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization strategy for parallel workloads;exploiting the potential of flexible processing units;reverse time migration with lossy and lossless wavefield compression;performance tuning for GPU-embedded systems: machine-learning-based and analytical model-driven tuning methodologies;WCSim: a cloud computing simulator with support for bag of tasks workflows;performance modeling of MARE2DEM’s adaptive mesh refinement for makespan estimation;and comparing performance and portability between CUDA and SYCL for protein database search on NVIDIA, AMD, and Intel GPUs.
The proceedings contain 40 papers. The topics discussed include: energy-efficient time series analysis using transprecision computing;performance analysis and optimization of the Vector-Kronecker product multiplicatio...
ISBN:
(纸本)9781728199245
The proceedings contain 40 papers. The topics discussed include: energy-efficient time series analysis using transprecision computing;performance analysis and optimization of the Vector-Kronecker product multiplication;an optimal model for optimizing the placement and parallelism of data stream processing applications on cloud-edge computing;reliable and energy-aware mapping of streaming series-parallel applications onto hierarchical platforms;selective protection for sparse iterative solvers to reduce the resilience overhead;evaluating computation and data placements in edge infrastructures through a common simulator;exploiting non-conventional DVFS on GPUs: application to deep learning;and MASA-StarPU: parallel sequence comparison with multiple scheduling policies and pruning.
The proceedings contain 18 papers. The topics discussed include: exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;DASS: dynam...
ISBN:
(纸本)9798350381603
The proceedings contain 18 papers. The topics discussed include: exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;DASS: dynamic adaptive sub-target specialization;optimizing microservices performance and resource utilization through containerized grouping: an experimental study;assessing the performance of an architecture-aware optimization tool for neural networks;an exploratory study of deep learning for predicting computational tasks behavior in HPC systems;exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;and energy consumption analysis of instruction cache prefetching methods.
The proceedings contain 8 papers. The topics discussed include: a memory affinity analysis of scientific applications on NUMA platforms;an evaluation of Cassandra NoSQL database on a low-power cluster;offloading the t...
ISBN:
(纸本)9781665417303
The proceedings contain 8 papers. The topics discussed include: a memory affinity analysis of scientific applications on NUMA platforms;an evaluation of Cassandra NoSQL database on a low-power cluster;offloading the training of an I/O access pattern detector to the cloud;selecting efficient VM types to train deep learning models on Amazon SageMaker;CLAP-BOT: a framework for automatic optimization of high-performance elastic applications on the clouds;towards optimizing computational costs of federated learning in clouds;quantifying and detecting HPC resource wastage in cloud environments;and a cloud-based batch processing system for loosely-coupled applications.
The proceedings contain 9 papers. The topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra ...
ISBN:
(纸本)9781665451574
The proceedings contain 9 papers. The topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra library for distributed architectures;implementing the broadcast operation in a distributed task-based runtime;homomorphic evaluation of large look-up tables for inference on human genome data in the cloud;towards a federated learning framework on a multi-cloud environment;energy-efficient online resource provisioning for cloud-edge platforms via multi-armed bandits;edge computing versus cloud computing: impact on retinal image pre-processing;and standalone data-center sizing combating the over-provisioning of the IT and electrical parts.
The proceedings contain 21 papers. The topics discussed include: a low-power hardware accelerator for ORB feature extraction in self-driving cars;improving phased transactional memory via commit throughput and capacit...
ISBN:
(纸本)9781665443012
The proceedings contain 21 papers. The topics discussed include: a low-power hardware accelerator for ORB feature extraction in self-driving cars;improving phased transactional memory via commit throughput and capacity estimation;design and evaluation of associative processing kernels;a task-based execution engine for distributed operating systems tailored to lightweight manycores with limited on-chip memory;sparsity-aware power gating for tensor cores;employing simulation to facilitate the design of dynamic binary translators;register flush-free runahead execution for modern vector processors;and shelf schedules for independent moldable tasks to minimize the energy consumption.
The proceedings contain 21 papers. The topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural...
ISBN:
(纸本)9798350356168
The proceedings contain 21 papers. The topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural networks;analyzing HPC monitoring data with a view towards efficient resource utilization;DYAD: locality-aware data management for accelerating deep learning training;IDS-DEEP: a strategy for selecting the best IDS for drones with heterogeneous embedded platforms;memory sandbox: a versatile tool for analyzing and optimizing HBM performance in FPGA;DeVAS: decoupled virtual address spaces;towards performance portability of an oil and gas application on heterogeneous architectures;and JANUS: a simple and efficient speculative defense using reinforcement learning.
The proceedings contain 34 papers. The topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree s...
ISBN:
(纸本)9781665451550
The proceedings contain 34 papers. The topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree search;exploring the effects of silent data corruption in distributed deep learning training;gem5-ndp: near-data processing architecture simulation from low level caches to dram;approximate memory with protected static allocation;dynamic set stealing to improve cache performance;avoiding unnecessary caching with history-based preemptive bypassing;memory-side acceleration and sparse compression for quantized packed convolutions;NUMA-aware dense matrix factorizations and inversion with look-ahead on multicore processors;and a predictive approach for dynamic replication of operators in distributed stream processing systems.
暂无评论