the proceedings contain 23 papers. the topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization st...
ISBN:
(纸本)9798350305487
the proceedings contain 23 papers. the topics discussed include: performance modeling and estimation of a configurable output stationary neural network accelerator;NeurOPar, a neural network-driven EDP optimization strategy for parallel workloads;exploiting the potential of flexible processing units;reverse time migration with lossy and lossless wavefield compression;performance tuning for GPU-embedded systems: machine-learning-based and analytical model-driven tuning methodologies;WCSim: a cloud computing simulator with support for bag of tasks workflows;performance modeling of MARE2DEM’s adaptive mesh refinement for makespan estimation;and comparing performance and portability between CUDA and SYCL for protein database search on NVIDIA, AMD, and Intel GPUs.
the proceedings contain 32 papers. the topics discussed include: multi-level parallelism in the computational modeling of the heart;computational characteristics of production seismic migration and its performance on ...
详细信息
ISBN:
(纸本)9780769530147
the proceedings contain 32 papers. the topics discussed include: multi-level parallelism in the computational modeling of the heart;computational characteristics of production seismic migration and its performance on novel processor architectures;exploring novel parallelization technologies for 3-D imaging applications;low-cost techniques for reducing branch context pollution in a soft realtime embedded multithreaded processor;predicting loop termination to boost speculative thread-level parallelism in embedded applications;performance improvement of the parallel lattice Boltzmann method through blocked data distributions;a scalable parallel deduplication algorithm;impacts of multiprocessor configurations on workloads in bioinformatics;efficient hardware for modular exponentiation using the sliding-window method with variable-length partitioning;and optimized math functions for a fixed-point DSP architecture.
the proceedings contain 22 papers. the topics discussed include: accurate and low-overhead dynamic detection and prediction of program phases using branch signatures;aggressive scheduling and speculation in multithrea...
the proceedings contain 22 papers. the topics discussed include: accurate and low-overhead dynamic detection and prediction of program phases using branch signatures;aggressive scheduling and speculation in multithreaded architectures: is it worth its salt?;an optimization mechanism intended for two-level cache hierarchy to improve energy and performance using the NSGAII algorithm;on simulated annealing for the scheduling of parallel applications;controlling processes reassignment in BSP applications;a highperformance massively parallel approach for real time deformable body physics simulation;a methodology for developing high fidelity communication models for large-scale applications targeted on multicore systems;and ORBIT: effective issue queue soft-error vulnerability mitigation on simultaneous multithreaded architectures using operand readiness-based instruction dispatch.
the proceedings contain 9 papers. the topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra ...
ISBN:
(纸本)9781665451574
the proceedings contain 9 papers. the topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra library for distributed architectures;implementing the broadcast operation in a distributed task-based runtime;homomorphic evaluation of large look-up tables for inference on human genome data in the cloud;towards a federated learning framework on a multi-cloud environment;energy-efficient online resource provisioning for cloud-edge platforms via multi-armed bandits;edge computing versus cloud computing: impact on retinal image pre-processing;and standalone data-center sizing combating the over-provisioning of the IT and electrical parts.
the proceedings contain 21 papers. the topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural...
ISBN:
(纸本)9798350356168
the proceedings contain 21 papers. the topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural networks;analyzing HPC monitoring data with a view towards efficient resource utilization;DYAD: locality-aware data management for accelerating deep learning training;IDS-DEEP: a strategy for selecting the best IDS for drones with heterogeneous embedded platforms;memory sandbox: a versatile tool for analyzing and optimizing HBM performance in FPGA;DeVAS: decoupled virtual address spaces;towards performance portability of an oil and gas application on heterogeneous architectures;and JANUS: a simple and efficient speculative defense using reinforcement learning.
the proceedings contain 34 papers. the topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree s...
ISBN:
(纸本)9781665451550
the proceedings contain 34 papers. the topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree search;exploring the effects of silent data corruption in distributed deep learning training;gem5-ndp: near-data processing architecture simulation from low level caches to dram;approximate memory with protected static allocation;dynamic set stealing to improve cache performance;avoiding unnecessary caching with history-based preemptive bypassing;memory-side acceleration and sparse compression for quantized packed convolutions;NUMA-aware dense matrix factorizations and inversion with look-ahead on multicore processors;and a predictive approach for dynamic replication of operators in distributed stream processing systems.
the proceedings contain 19 papers. the topics discussed include: energy consumption improvement of shared-cache multicore clusters based on explicit simultaneous multithreading;performance and energy analysis of OpenM...
ISBN:
(纸本)9781538648193
the proceedings contain 19 papers. the topics discussed include: energy consumption improvement of shared-cache multicore clusters based on explicit simultaneous multithreading;performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms;a case study of performance optimization in a heterogeneous environment;tuning up TVD HOPMOC method on Intel MIC Xeon Phi architectures with Intel parallel studio tools;comparing performance of C compilers optimizations on different multicore architectures;HPSM: a programming framework for multi-CPU and multi-GPU systems;assessing sparse triangular linear system solvers on GPUs;automatic partitioning of stencil computations on heterogeneous systems;strategies to improve the performance of a geophysics model for different Manycore systems;parallel algorithm for dynamic community detection;efficient in-situ quantum computing simulation of Shor's and Grover's algorithms;a parallel algorithm for minimum spanning tree on GPU;acceleration of cellular automata through parallel computing with OpenCL;a dataflow implementation of region growing method for cracks segmentation;automatic scan parallelization in OpenMP;impact of version management for transactional memories on phase-change memories;efficient Pathfinding co-processors for FPGAs;and a communication protocol for fog computing based on network coding applied to wireless sensors.
the proceedings contain 36 papers. the topics discussed include: the network adapter: the missing link between MPI applications and network performance;on the efficiency of register file versus broadcast interconnect ...
the proceedings contain 36 papers. the topics discussed include: the network adapter: the missing link between MPI applications and network performance;on the efficiency of register file versus broadcast interconnect for collective communications in data-parallel hardware accelerators;network endpoints for clusters of SMPs;assessing energy efficiency of fault tolerance protocols for HPC systems;using heterogeneous networks to improve energy efficiency in direct coherence protocols for many-core CMPs;energy savings via dead sub-block prediction;scalable thread scheduling in asymmetric multicores for power efficiency;divergence analysis with affine constraints;exploiting concurrent GPU operations for efficient work stealing on multi-GPUs;sparse fast Fourier transform on GPUs and multi-core CPUs;cloud workload analysis with SWAT;and scalable algorithms for distributed-memory adaptive mesh refinement.
the proceedings contain 28 papers. the topics discussed include: experiences with disjoint data structures in a new hardware transactional memory system;HotStream: efficient data streaming of complex patterns to multi...
ISBN:
(纸本)9781479929276
the proceedings contain 28 papers. the topics discussed include: experiences with disjoint data structures in a new hardware transactional memory system;HotStream: efficient data streaming of complex patterns to multiple accelerating kernels;large payload streaming database sort and projection on FPGAs;extending summation precision for network reduction operations;attaining strictly increasing and precise time count in energy-efficient computer systems;dynamic selective devectorization for efficient power gating of SIMD units in a HW/SW co-designed environment;optimizing a 3D-FWT code in a heterogeneous cluster of multicore CPUs and manycore GPUs;a CPU, GPU, FPGA system for x-ray image processing using high-speed scientific cameras;a parallel IRAM algorithm to compute PageRank for modeling epidemic spread;invasive compute balancing for applications with hybrid parallelization;and on the performance of code block segmentation for LTE-advanced: an in-depth analysis.
the proceedings contain 23 papers. the topics discussed include: extending OmpSs for OpenCL kernel co-execution in heterogeneous systems;data coherence analysis and optimization for heterogeneous computing;exploring h...
ISBN:
(纸本)9781509012336
the proceedings contain 23 papers. the topics discussed include: extending OmpSs for OpenCL kernel co-execution in heterogeneous systems;data coherence analysis and optimization for heterogeneous computing;exploring heterogeneous mobile architectures with a high-level programming model;scalability of CPU and GPU solutions of the prime elliptic curve discrete logarithm problem;overcoming memory-capacity constraints in the use of ILUPACK on graphics processors;exploiting data compression to mitigate aging in GPU register files;SEDEA: a sensible approach to account DRAM energy in multicore systems;a user-level scheduling framework for BoT applications on private clouds;GC-CR: a decentralized garbage collector component for checkpointing in clouds;towards a deterministic fine-grained task ordering using multi-versioned memory;FGSCM: a fine-grained approach to transactional lock elision;a machine learning approach for performance prediction and scheduling on heterogeneous CPUs;object placement for high bandwidth memory augmented withhigh capacity memory;accelerating graph analytics on CPU-FPGA heterogeneous platform;online multimedia similarity search with response time-aware parallelism and task granularity auto-tuning;a publish/subscribe system using causal broadcast over dynamically built spanning trees;global snapshot of a distributed system running on virtual machines;and resource-management study in HPC runtime-stacking context.
暂无评论