The proceedings contain 94 papers. The topics discussed include: SGCN: exploiting compressed-sparse features in deep graph convolutional network accelerators;INCA: input-stationary dataflow at outside-the-box thinking...
ISBN:
(纸本)9781665476522
The proceedings contain 94 papers. The topics discussed include: SGCN: exploiting compressed-sparse features in deep graph convolutional network accelerators;INCA: input-stationary dataflow at outside-the-box thinking about deep learning accelerators;logical/physical topology-aware collective communication in deep learning training;Sibia: signed bit-slice architecture for dense DNN acceleration with slice-level sparsity exploitation;baryon: efficient hybrid memory management with compression and sub-blocking;root crash consistency of SGX-style integrity trees in secure non-volatile memory systems;are randomized caches truly random? formal analysis of randomized-partitioned caches;leveraging domain information for the efficient automated design of deep learning accelerators;efficient distributed secure memory with migratable Merkle tree;scalable and secure row-swap: efficient and safe row hammer mitigation in memory systems;and CTA: hardware-software co-design for compressed token attention mechanism.
The proceedings contain 23 papers. The topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization st...
ISBN:
(纸本)9798350305487
The proceedings contain 23 papers. The topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization strategy for parallel workloads;exploiting the potential of flexible processing units;reverse time migration with lossy and lossless wavefield compression;performance tuning for GPU-embedded systems: machine-learning-based and analytical model-driven tuning methodologies;WCSim: a cloud computing simulator with support for bag of tasks workflows;performance modeling of MARE2DEM’s adaptive mesh refinement for makespan estimation;and comparing performance and portability between CUDA and SYCL for protein database search on NVIDIA, AMD, and Intel GPUs.
The proceedings contain 8 papers. The topics discussed include: a memory affinity analysis of scientific applications on NUMA platforms;an evaluation of Cassandra NoSQL database on a low-power cluster;offloading the t...
ISBN:
(纸本)9781665417303
The proceedings contain 8 papers. The topics discussed include: a memory affinity analysis of scientific applications on NUMA platforms;an evaluation of Cassandra NoSQL database on a low-power cluster;offloading the training of an I/O access pattern detector to the cloud;selecting efficient VM types to train deep learning models on Amazon SageMaker;CLAP-BOT: a framework for automatic optimization of high-performance elastic applications on the clouds;towards optimizing computational costs of federated learning in clouds;quantifying and detecting HPC resource wastage in cloud environments;and a cloud-based batch processing system for loosely-coupled applications.
The proceedings contain 40 papers. The topics discussed include: energy-efficient time series analysis using transprecision computing;performance analysis and optimization of the Vector-Kronecker product multiplicatio...
ISBN:
(纸本)9781728199245
The proceedings contain 40 papers. The topics discussed include: energy-efficient time series analysis using transprecision computing;performance analysis and optimization of the Vector-Kronecker product multiplication;an optimal model for optimizing the placement and parallelism of data stream processing applications on cloud-edge computing;reliable and energy-aware mapping of streaming series-parallel applications onto hierarchical platforms;selective protection for sparse iterative solvers to reduce the resilience overhead;evaluating computation and data placements in edge infrastructures through a common simulator;exploiting non-conventional DVFS on GPUs: application to deep learning;and MASA-StarPU: parallel sequence comparison with multiple scheduling policies and pruning.
The proceedings contains 32 papers. The topics discussed include: an adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors;evaluating MapReduce for multi-core and multiprocessor systems;exten...
详细信息
ISBN:
(纸本)1424408059
The proceedings contains 32 papers. The topics discussed include: an adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors;evaluating MapReduce for multi-core and multiprocessor systems;extending multicore architectures to exploit hybrid parallelism in single-thread applications;implications of device timing variability on full chip timing;optical interconnect opportunities in future server memory systems;feedback directed perfetching: improving the performance and bandwidth-efficiency of hardware prefetchers;improving branch prediction and predicated execution in out-of-order processors;accelerating and adapting precomputation threads for efficient prefetching;a scalable, non-blocking approach to transactional memory;and fully-buffered DIMM memory architectures: understanding mechanisms, overheads, and scaling.
The proceedings contain 16 papers. The topics discussed include: outline of a thick control flow architecture;a dynamic load balance algorithm for the S4 parallel stream processing engine;a processor workload distribu...
ISBN:
(纸本)9781509048441
The proceedings contain 16 papers. The topics discussed include: outline of a thick control flow architecture;a dynamic load balance algorithm for the S4 parallel stream processing engine;a processor workload distribution algorithm for massively parallel applications;parallelism and scalability: a solution focused on the cloud computing processing service billing;task scheduling in sucuri dataflow library;synchronization-free automatic parallelization for arbitrarily nested affine loops;thread footprint analysis for the design of multithreaded applications and multicore systems;a hybrid parallel algorithm for the auction algorithm in multicore systems;and dataflow to hardware synthesis framework on FPGAs.
The proceedings contain 18 papers. The topics discussed include: exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;DASS: dynam...
ISBN:
(纸本)9798350381603
The proceedings contain 18 papers. The topics discussed include: exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;DASS: dynamic adaptive sub-target specialization;optimizing microservices performance and resource utilization through containerized grouping: an experimental study;assessing the performance of an architecture-aware optimization tool for neural networks;an exploratory study of deep learning for predicting computational tasks behavior in HPC systems;exploring federated learning to trace depression in social media with language models;computing seismic attributes with deep-learning models;and energy consumption analysis of instruction cache prefetching methods.
The proceedings contain 55 papers. The topics discussed include: exploring architectural heterogeneity in intelligent vision systems;increasing multicore system efficiency through intelligent bandwidth shifting;exploi...
ISBN:
(纸本)9781479989300
The proceedings contain 55 papers. The topics discussed include: exploring architectural heterogeneity in intelligent vision systems;increasing multicore system efficiency through intelligent bandwidth shifting;exploiting compressed block size as an indicator of future reuse;talus: a simple way to remove cliffs in cache performance;priority-based cache allocation for throughput processors;bamboo ECC: strong, safe, and flexible codes for reliable computer memory;heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories;domain knowledge based energy management in handhelds;GPU voltage noise: characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures;and hierarchical private/shared classification: the key to simple and efficient coherence for clustered cache hierarchies.
暂无评论