the proceedings contain 23 papers. the topics discussed include: extending OmpSs for OpenCL kernel co-execution in heterogeneous systems;data coherence analysis and optimization for heterogeneous computing;exploring h...
ISBN:
(纸本)9781509012336
the proceedings contain 23 papers. the topics discussed include: extending OmpSs for OpenCL kernel co-execution in heterogeneous systems;data coherence analysis and optimization for heterogeneous computing;exploring heterogeneous mobile architectures with a high-level programming model;scalability of CPU and GPU solutions of the prime elliptic curve discrete logarithm problem;overcoming memory-capacity constraints in the use of ILUPACK on graphics processors;exploiting data compression to mitigate aging in GPU register files;SEDEA: a sensible approach to account DRAM energy in multicore systems;a user-level scheduling framework for BoT applications on private clouds;GC-CR: a decentralized garbage collector component for checkpointing in clouds;towards a deterministic fine-grained task ordering using multi-versioned memory;FGSCM: a fine-grained approach to transactional lock elision;a machine learning approach for performance prediction and scheduling on heterogeneous CPUs;object placement for high bandwidth memory augmented withhigh capacity memory;accelerating graph analytics on CPU-FPGA heterogeneous platform;online multimedia similarity search with response time-aware parallelism and task granularity auto-tuning;a publish/subscribe system using causal broadcast over dynamically built spanning trees;global snapshot of a distributed system running on virtual machines;and resource-management study in HPC runtime-stacking context.
In this paper, we discuss an IEEE 754 compliant normalized floating-point divide and square root unit that utilizes iterative approximation. We provide a robust architecturethat allows multiple formats and all IEEE 7...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
In this paper, we discuss an IEEE 754 compliant normalized floating-point divide and square root unit that utilizes iterative approximation. We provide a robust architecturethat allows multiple formats and all IEEE 754 rounding modes while still exhibiting high-performance. Moreover, we also adhere to the IEEE 754 2019 standard and demonstrate methods for rounding results to all five rounding modes using iterative approximation. performance, Power, and Area estimates are determined from physical synthesis using ARM-based standard cells in a TSMC 28nm process. this paper also presents comparisons versus other implementations and demonstrates the efficient of the approach presented here.
In memory-bound problems, Field Programmable Gate Arrays (FPGAs) have traditionally underperformed compared to Graphics Processing Units (GPUs) due to their lower memory bandwidth. However, high bandwidth memory (HBM)...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
In memory-bound problems, Field Programmable Gate Arrays (FPGAs) have traditionally underperformed compared to Graphics Processing Units (GPUs) due to their lower memory bandwidth. However, high bandwidth memory (HBM) in FPGAs has significantly enhanced their performance, achieving bandwidths up to 425 GB/s. Additionally, FPGAs offer the advantage of customizable accelerators for domain-specific tasks, potentially outperforming general-purpose GPU architectures. this work focuses on accelerating random forest algorithms on FPGAs, leveraging their customization capabilities to manage control flow structures such as decision branches efficiently. Despite these advancements, FPGAs remain challenging to program, requiring a deep understanding of hardware design. We propose a new hardware generator that integrates necessary tools into a cohesive workflow to address this, simplifying FPGA development. We validated the design on a Xilinx Alveo FPGA, using 32 HBM channels and reaching a performance of 8 billion samples per second. this work offers a practical solution for memory-bound machine learning tasks in high-performancecomputing environments.
the proceedings contain 33 papers. the topics discussed include: value based BTB indexing for indirect jump prediction;operating system support for overlapping-ISA heterogeneous multicore architectures;scalable archit...
ISBN:
(纸本)9781424456581
the proceedings contain 33 papers. the topics discussed include: value based BTB indexing for indirect jump prediction;operating system support for overlapping-ISA heterogeneous multicore architectures;scalable architectural support for trusted software;atlas: a scalable and highperformance scheduling algorithm for multiple memory controllers;understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance;leadout: composing low-overhead frequency-enhancing techniques for single-thread performance in configurable multicores;a bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems;designing a processor from the ground up to allow voltage/reliability tradeoffs;a hybrid solid-state storage architecture for the performance, energy consumption, and lifetime improvement;and extreme scale computing: challenges and opportunities.
Link Prediction(LP) is a fundamental problem in graph machine learning that aims to predict the existence of links between nodes. Most current research on LP adopts Graph Neural Networks (GNNs) to learn the representa...
详细信息
Frequency estimation, a.k.a. histograms, is a workhorse of data analysis, and as such has been thoroughly studied under differentially privacy. In particular, computing histograms in the local model of privacy has bee...
详细信息
暂无评论