Fully homomorphic encryption (FHE) is a powerful cryptographic technique that enables computation on encrypted data without needing to decrypt it. It has broad applications in scenarios where sensitive data needs to b...
详细信息
ISBN:
(纸本)9798331506476
Fully homomorphic encryption (FHE) is a powerful cryptographic technique that enables computation on encrypted data without needing to decrypt it. It has broad applications in scenarios where sensitive data needs to be processed in the cloud or in other untrusted environments. FHE applications are both compute- and memory-intensive, owing to expensive operations on large data. While prior works address the challenges of efficient compute using dedicated hardware, expensive memory transfers still remain a major limiting factor. In this work, we propose a hierarchical near-DRAM processing (NDP) solution for FHE applications, called FHENDI, that harnesses the massive DRAM bank bandwidth. We observe various data access patterns in FHE that reveal distinct levels of parallelism: element-wise, limb-wise, coefficient-wise, and ciphertext-wise. FHENDI exploits these levels of parallelism to map FHE operations and data onto different hierarchies of our design, while addressing three major challenges with NDP for FHE: (i) the lack of bank-to-bank communication support, (ii) limited die-to-die bandwidth, and (iii) large memory access latencies. We resolve the first problem through a novel, conflict-free mapping algorithm built atop localized permutation networks that enables efficient element-wise and butterfly operations in FHE. The second problem is addressed by pipelining the execution of parallel bootstrap operations observed in compiled FHE workloads. Finally, we hide the memory access latency behind computation latency by exploiting a dual-banking scheme and subarray-level parallelism (SLP) of the DRAM banks. We evaluate FHENDI using representative workloads in the domains of privacy-preserving machine learning inference on CNNs and Transformers, database range query, and sorting, that are obtained using a compiler framework called HElayers. We compare FHENDI with a server-class CPU and GPU running the state-of-the-art HEaaN library, and an FHE accelerator ASIC, and repo
Modern FPGAs integrate High Bandwidth Memory (HBM), offering up to 12× the DDR bandwidth distributed across multiple memory interfaces. To utilize the most of HBM's theoretical bandwidth, accelerators typical...
详细信息
ISBN:
(数字)9798331502812
ISBN:
(纸本)9798331502829
Modern FPGAs integrate High Bandwidth Memory (HBM), offering up to 12× the DDR bandwidth distributed across multiple memory interfaces. To utilize the most of HBM's theoretical bandwidth, accelerators typically issue long bursts and exploit data locality. However, some applications like sparse matrix-vector multiplication (SpMV) and graph analytics often exhibit irregular, nonbursting memory access patterns, which hinder performance. Additionally, the HBM interconnect, essential for accessing multiple interfaces, may stall requests under certain conditions. This work introduces HBMex, a novel module designed to enhance HBM throughput for accelerators with irregular access patterns. Positioned between the accelerator and the HBM, HBMex improves parallelism by distributing memory requests across interfaces and mitigates stalls caused by the interconnect. We evaluate HBMex using memory access microbenchmarks and an SpMV accelerator, demonstrating throughput gains of up to 37% across real-world workloads compared to vendor-provided solutions. HBMex is distributed as a highly-configurable and open-source RTL generator.
The proceedings contain 461 papers. The topics discussed include: how to make discretionary access control secure against trojan horses;random number generation for serial, parallel, distributed, and grid-based financ...
详细信息
ISBN:
(纸本)9781424416943
The proceedings contain 461 papers. The topics discussed include: how to make discretionary access control secure against trojan horses;random number generation for serial, parallel, distributed, and grid-based financial computations;mobility control schemes with quick convergence in wireless sensor networks;design and implementation of a tool for modeling and programming deadlock free meta-pipeline applications;analytic performance models for bounded queuing systems;on the construction of paired many-to-many disjoint path covers in hypercube-like interconnection networks with faulty elements;a scalable configurable architecture for the massively parallel GCA model;state management for distributed python applications;a fault-tolerant system for Java/CORBA objects;and improving data availability for a cluster file system through replication.
The proceedings contain 362 papers. The topics discussed include: uniform scattering of autonomous mobile robots in a grid;performance study of interference on sharing GPU and CPU resources with multiple applications;...
ISBN:
(纸本)9781424437504
The proceedings contain 362 papers. The topics discussed include: uniform scattering of autonomous mobile robots in a grid;performance study of interference on sharing GPU and CPU resources with multiple applications;resource allocation strategies for constructive in-network stream processing;deciding model of population size in time-constrained task scheduling;improving accuracy of host load predictions on computational grids by artificial neural networks;combining multiple heuristics on discrete resources;predictive analysis and optimization of pipelined wavefront computations;RSA encryption and decryption using the redundant number system on the FPGA;computation with a constant number of steps in membrane computing;analytical model of inter-node communication under multi-versioned coherence mechanisms;and a distributed approach for the problem of routing and wavelength assignment in WDM networks.
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machin...
ISBN:
(纸本)9781665435772
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machine learning;improving the operational capability of automated empirical performance modeling;development of a middleware to create an efficient unified programming model for heterogeneous computing;task-level checkpointing for nested fork-join programs;verifiable coded computing: towards fast and secure distributed computing;hierarchical cost analysis for distributed deep learning;pattern-aware vectorization for sparse matrix computations;and heterogeneity-aware deep learning workload deployments on the computing continuum.
The proceedings contain 210 papers. The topics discussed include: towards a green, QoS-enabled heterogeneous cloud infrastructure;predicting job completion time in heterogeneous MapReduce environments;minimizing renta...
ISBN:
(纸本)9781509021406
The proceedings contain 210 papers. The topics discussed include: towards a green, QoS-enabled heterogeneous cloud infrastructure;predicting job completion time in heterogeneous MapReduce environments;minimizing rental cost for multiple recipe applications in the cloud;providing fairness in heterogeneous multicores with a predictive, adaptive scheduler;dynamic resource management for parallel tasks in an oversubscribed energy-constrained heterogeneous environment;evaluation of emerging energy-efficient heterogeneous computing platforms for biomolecular and cellular simulation workloads;latency, power, and security optimization in distributed reconfigurable embedded systems;and a reconfigurable fixed-point architecture for adaptive beamforming.
The proceedings contain 113 papers. The topics discussed include: optimizing parallel graph connectivity computation via subgraph sampling;a parallel algorithm for Bayesian network inference using arithmetic circuits;...
ISBN:
(纸本)9781538643686
The proceedings contain 113 papers. The topics discussed include: optimizing parallel graph connectivity computation via subgraph sampling;a parallel algorithm for Bayesian network inference using arithmetic circuits;cataloging the visible universe through Bayesian inference at petascale;efficient, parallel at-scale correlation analysis for atom probe tomography on hybrid architectures;a fast and massively-parallel inverse solver for multiple-scattering tomographic image reconstruction;real-time massively distributed multi-object adaptive optics simulations for the European extremely large telescope;performance isolation of data-intensive scale-out applications in a multi-tenant cloud;scalable data resilience for in-memory data staging;and performance and scalability of lightweight multi-kernel based operating systems.
The proceedings contain 109 papers. The topics discussed include: balanced coloring for parallel computing applications;high-performance graph analytics on manycore processors;scalable community detection with the Lou...
ISBN:
(纸本)9781479986484
The proceedings contain 109 papers. The topics discussed include: balanced coloring for parallel computing applications;high-performance graph analytics on manycore processors;scalable community detection with the Louvain algorithm;cooperative computing for autonomous data centers;divide and conquer symmetric tridiagonal eigensolver for multicore architectures;contention-based nonminimal adaptive routing in high-radix networks;identifying the culprits behind network congestion;embedding nonblocking multicast virtual networks in fat-tree data centers;cashmere: heterogeneous many-core computing;a scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators;hierarchical DAG scheduling for hybrid distributed systems;pushing the performance envelope of modular exponentiation across multiple generations of GPUs;and addressing fairness in SMT multicores with a progress-aware scheduler.
The proceedings contain 112 papers. The topics discussed include: an accurate tool for modeling, fingerprinting, comparison, and clustering of parallelapplications based on performance counters;SmarTmem: intelligent ...
ISBN:
(纸本)9781728135106
The proceedings contain 112 papers. The topics discussed include: an accurate tool for modeling, fingerprinting, comparison, and clustering of parallelapplications based on performance counters;SmarTmem: intelligent management of transcendent memory in a virtualized server;data reliability and redundancy optimization of a secure multi-cloud storage under uncertainty of errors and falsifications;a portable GPU framework for SNP comparisons;towards a methodology for benchmarking edge processing frameworks;a fast local algorithm for track reconstruction on parallel architectures;towards native execution of deep learning on a leadership-class hpc system;improving robustness of heterogeneous serverless computing systems via probabilistic task pruning;and influence of tasks duration variability on task-based runtime schedulers.
The proceedings contain 110 papers. The topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed mu...
ISBN:
(纸本)9781728168760
The proceedings contain 110 papers. The topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed multi-GPUs;DPF-ECC: accelerating elliptic curve cryptography with floating-point computing power of GPUs;inter-job scheduling of high-throughput material screening applications;learning an effective charging scheme for mobile devices;improving transactional code generation via variable annotation and barrier elision;solving the container explosion problem for distributed high throughput computing;CycLedger: a scalable and secure parallel protocol for distributed ledger via sharding;DAG-aware joint task scheduling and cache management in spark clusters;and understanding the interplay between hardware errors and user job characteristics on the Titan supercomputer.
暂无评论