the proceedings contain 9 papers. the topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra ...
ISBN:
(纸本)9781665451574
the proceedings contain 9 papers. the topics discussed include: compiling files in parallel: a study with GCC;I/O performance of multiscale finite element simulations on HPC environments;an OpenMP-only linear algebra library for distributed architectures;implementing the broadcast operation in a distributed task-based runtime;homomorphic evaluation of large look-up tables for inference on human genome data in the cloud;towards a federated learning framework on a multi-cloud environment;energy-efficient online resource provisioning for cloud-edge platforms via multi-armed bandits;edge computing versus cloud computing: impact on retinal image pre-processing;and standalone data-center sizing combating the over-provisioning of the IT and electrical parts.
the proceedings contain 21 papers. the topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural...
ISBN:
(纸本)9798350356168
the proceedings contain 21 papers. the topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural networks;analyzing HPC monitoring data with a view towards efficient resource utilization;DYAD: locality-aware data management for accelerating deep learning training;IDS-DEEP: a strategy for selecting the best IDS for drones with heterogeneous embedded platforms;memory sandbox: a versatile tool for analyzing and optimizing HBM performance in FPGA;DeVAS: decoupled virtual address spaces;towards performance portability of an oil and gas application on heterogeneous architectures;and JANUS: a simple and efficient speculative defense using reinforcement learning.
the proceedings contain 34 papers. the topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree s...
ISBN:
(纸本)9781665451550
the proceedings contain 34 papers. the topics discussed include: TCUDA: a QoS-based GPU sharing framework for autonomous navigation systems;Seriema: RDMA-based remote invocation with a case-study on Monte-Carlo tree search;exploring the effects of silent data corruption in distributed deep learning training;gem5-ndp: near-data processing architecture simulation from low level caches to dram;approximate memory with protected static allocation;dynamic set stealing to improve cache performance;avoiding unnecessary caching with history-based preemptive bypassing;memory-side acceleration and sparse compression for quantized packed convolutions;NUMA-aware dense matrix factorizations and inversion with look-ahead on multicore processors;and a predictive approach for dynamic replication of operators in distributed stream processing systems.
the proceedings contain 28 papers. the topics discussed include: experiences with disjoint data structures in a new hardware transactional memory system;HotStream: efficient data streaming of complex patterns to multi...
ISBN:
(纸本)9781479929276
the proceedings contain 28 papers. the topics discussed include: experiences with disjoint data structures in a new hardware transactional memory system;HotStream: efficient data streaming of complex patterns to multiple accelerating kernels;large payload streaming database sort and projection on FPGAs;extending summation precision for network reduction operations;attaining strictly increasing and precise time count in energy-efficient computer systems;dynamic selective devectorization for efficient power gating of SIMD units in a HW/SW co-designed environment;optimizing a 3D-FWT code in a heterogeneous cluster of multicore CPUs and manycore GPUs;a CPU, GPU, FPGA system for x-ray image processing using high-speed scientific cameras;a parallel IRAM algorithm to compute PageRank for modeling epidemic spread;invasive compute balancing for applications with hybrid parallelization;and on the performance of code block segmentation for LTE-advanced: an in-depth analysis.
the proceedings contain 36 papers. the topics discussed include: the network adapter: the missing link between MPI applications and network performance;on the efficiency of register file versus broadcast interconnect ...
the proceedings contain 36 papers. the topics discussed include: the network adapter: the missing link between MPI applications and network performance;on the efficiency of register file versus broadcast interconnect for collective communications in data-parallel hardware accelerators;network endpoints for clusters of SMPs;assessing energy efficiency of fault tolerance protocols for HPC systems;using heterogeneous networks to improve energy efficiency in direct coherence protocols for many-core CMPs;energy savings via dead sub-block prediction;scalable thread scheduling in asymmetric multicores for power efficiency;divergence analysis with affine constraints;exploiting concurrent GPU operations for efficient work stealing on multi-GPUs;sparse fast Fourier transform on GPUs and multi-core CPUs;cloud workload analysis with SWAT;and scalable algorithms for distributed-memory adaptive mesh refinement.
the proceedings contain 19 papers. the topics discussed include: energy consumption improvement of shared-cache multicore clusters based on explicit simultaneous multithreading;performance and energy analysis of OpenM...
ISBN:
(纸本)9781538648193
the proceedings contain 19 papers. the topics discussed include: energy consumption improvement of shared-cache multicore clusters based on explicit simultaneous multithreading;performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms;a case study of performance optimization in a heterogeneous environment;tuning up TVD HOPMOC method on Intel MIC Xeon Phi architectures with Intel parallel studio tools;comparing performance of C compilers optimizations on different multicore architectures;HPSM: a programming framework for multi-CPU and multi-GPU systems;assessing sparse triangular linear system solvers on GPUs;automatic partitioning of stencil computations on heterogeneous systems;strategies to improve the performance of a geophysics model for different Manycore systems;parallel algorithm for dynamic community detection;efficient in-situ quantum computing simulation of Shor's and Grover's algorithms;a parallel algorithm for minimum spanning tree on GPU;acceleration of cellular automata through parallel computing with OpenCL;a dataflow implementation of region growing method for cracks segmentation;automatic scan parallelization in OpenMP;impact of version management for transactional memories on phase-change memories;efficient Pathfinding co-processors for FPGAs;and a communication protocol for fog computing based on network coding applied to wireless sensors.
the proceedings contain 23 papers. the topics discussed include: extending OmpSs for OpenCL kernel co-execution in heterogeneous systems;data coherence analysis and optimization for heterogeneous computing;exploring h...
ISBN:
(纸本)9781509012336
the proceedings contain 23 papers. the topics discussed include: extending OmpSs for OpenCL kernel co-execution in heterogeneous systems;data coherence analysis and optimization for heterogeneous computing;exploring heterogeneous mobile architectures with a high-level programming model;scalability of CPU and GPU solutions of the prime elliptic curve discrete logarithm problem;overcoming memory-capacity constraints in the use of ILUPACK on graphics processors;exploiting data compression to mitigate aging in GPU register files;SEDEA: a sensible approach to account DRAM energy in multicore systems;a user-level scheduling framework for BoT applications on private clouds;GC-CR: a decentralized garbage collector component for checkpointing in clouds;towards a deterministic fine-grained task ordering using multi-versioned memory;FGSCM: a fine-grained approach to transactional lock elision;a machine learning approach for performance prediction and scheduling on heterogeneous CPUs;object placement for high bandwidth memory augmented withhigh capacity memory;accelerating graph analytics on CPU-FPGA heterogeneous platform;online multimedia similarity search with response time-aware parallelism and task granularity auto-tuning;a publish/subscribe system using causal broadcast over dynamically built spanning trees;global snapshot of a distributed system running on virtual machines;and resource-management study in HPC runtime-stacking context.
the following topics are dealt with: computerarchitecture; highperformancecomputing; parallel and distributed algorithm, routing and communication; application-specific architectures and reconfigurable systems; gri...
the following topics are dealt with: computerarchitecture; highperformancecomputing; parallel and distributed algorithm, routing and communication; application-specific architectures and reconfigurable systems; grid, cluster, pervasive, and heterogeneous computing; languages, compilers, and tools; processor microarchitectures; operating systems; processor and cache memory architectures, benchmarking, and performance analysis; fault tolerant systems; and load balancing.
暂无评论