The proceedings contain 222 papers. The topics discussed include: DRL-deploy: adaptive service function chains deployment with deep reinforcement learning;accuracy vs. efficiency: achieving both through hardware-aware...
ISBN:
(纸本)9781665435741
The proceedings contain 222 papers. The topics discussed include: DRL-deploy: adaptive service function chains deployment with deep reinforcement learning;accuracy vs. efficiency: achieving both through hardware-aware quantization and reconfigurable architecture with mixed precision;cmss: collaborative modeling of safety and security requirements for network protocols;FGPA: fine-grained pipelined acceleration for depthwise separable CNN in resource constraint scenarios;Dyacon: JointCloud dynamic access control model of data security based on verifiable credentials;understanding the runtime overheads of deep learning inference on edge devices;and alleviating imbalance in synchronous distributed training of deep neural networks.
The proceedings contain 117 papers. The topics discussed include: detection of a novel dual attack in named data networking;fair DMA scheduler for low-latency accelerator offloading;multi-attribute decision-making met...
ISBN:
(纸本)9781665464970
The proceedings contain 117 papers. The topics discussed include: detection of a novel dual attack in named data networking;fair DMA scheduler for low-latency accelerator offloading;multi-attribute decision-making method based on interval intuitionistic trapezoidal fuzzy number to determine the expert weight: note: sub-titles are not captured in Xplore and should not be used;binary-level directed symbolic execution through pattern learning;an efficient metric-based approach for static use-after-free detection;a graph convolution neural network based method for insider threat detection;maintenance worker scheduling for charging pile fault: a multi-agent RL approach;towards secure bilateral friend query with conjunctive policy matching in social networks;structure-noise-aware anchor link prediction across social networks;file system to support secure cloud-based sharing;discovering agent models using process mining: initial approach and a case study;and towards agent-based simulation of the parallel trading market of pharmaceuticals.
Sparse general matrix-matrix multiplication (SpGEMM) is a core primitive for numerous scientific applications. Traditional hash-based approaches fail to strike a balance between reducing hash collisions and efficientl...
详细信息
In the era of big data, efficiently processing and retrieving insights from unstructured data presents a critical challenge. This paper introduces a scalable leader-worker distributed data pipeline designed to handle ...
详细信息
Near DRAM processing (NDP) architectures have emerged to be a promising solution for commercializing in-memory computing and addressing the 'memory wall' problem, especially for the memory-intensive machine le...
详细信息
ISBN:
(纸本)9798331506476
Near DRAM processing (NDP) architectures have emerged to be a promising solution for commercializing in-memory computing and addressing the 'memory wall' problem, especially for the memory-intensive machine learning (ML) workloads. In NDP architectures, the processing Units (PUs) are distributed next to different memory units to exploit the high internal bandwidth. Therefore, in order to fully utilize the bandwidth advantage of NDP architectures for ML applications, meticulous evaluations and optimizations of data placement in DRAM and workload scheduling among different PUs are required. However, existing simulation and compilation tools face two insuperable obstacles to achieving these targets. On the one hand, tools for traditional von Neumann architectures only focus on the data access behaviors between the host and DRAM and treat DRAM as a whole part, which cannot support NDP architectures with multiple independent processing and memory units working simultaneously. On the other hand, existing NDP simulators and compilers are designed for specific DRAM technology and NDP architecture, lacking compatibility for various NDP architectures. In order to overcome these challenges and optimize data mapping and workload scheduling for different NDP architectures, we propose UniNDP, a unified NDP compilation and simulation tool for ML applications. Firstly, we propose a unified tree-based NDP hardware abstraction and the corresponding instruction set, enabling the support for various NDP architectures based on different DRAM technologies. Secondly, we design a cycle-accurate and instruction-driven NDP simulator to evaluate hardware performance by accurately tracking the working status of memory elements and PUs. The accurate simulation can provide effective guidance for compilation. Thirdly, we design an NDP compiler that optimizes data partition, mapping, and workload scheduling in different DRAM hierarchies. Furthermore, to enhance the compilation efficiency, we propo
Fully homomorphic encryption (FHE) is a powerful cryptographic technique that enables computation on encrypted data without needing to decrypt it. It has broad applications in scenarios where sensitive data needs to b...
详细信息
ISBN:
(纸本)9798331506476
Fully homomorphic encryption (FHE) is a powerful cryptographic technique that enables computation on encrypted data without needing to decrypt it. It has broad applications in scenarios where sensitive data needs to be processed in the cloud or in other untrusted environments. FHE applications are both compute- and memory-intensive, owing to expensive operations on large data. While prior works address the challenges of efficient compute using dedicated hardware, expensive memory transfers still remain a major limiting factor. In this work, we propose a hierarchical near-DRAM processing (NDP) solution for FHE applications, called FHENDI, that harnesses the massive DRAM bank bandwidth. We observe various data access patterns in FHE that reveal distinct levels of parallelism: element-wise, limb-wise, coefficient-wise, and ciphertext-wise. FHENDI exploits these levels of parallelism to map FHE operations and data onto different hierarchies of our design, while addressing three major challenges with NDP for FHE: (i) the lack of bank-to-bank communication support, (ii) limited die-to-die bandwidth, and (iii) large memory access latencies. We resolve the first problem through a novel, conflict-free mapping algorithm built atop localized permutation networks that enables efficient element-wise and butterfly operations in FHE. The second problem is addressed by pipelining the execution of parallel bootstrap operations observed in compiled FHE workloads. Finally, we hide the memory access latency behind computation latency by exploiting a dual-banking scheme and subarray-level parallelism (SLP) of the DRAM banks. We evaluate FHENDI using representative workloads in the domains of privacy-preserving machine learning inference on CNNs and Transformers, database range query, and sorting, that are obtained using a compiler framework called HElayers. We compare FHENDI with a server-class CPU and GPU running the state-of-the-art HEaaN library, and an FHE accelerator ASIC, and repo
The proceedings contain 461 papers. The topics discussed include: how to make discretionary access control secure against trojan horses;random number generation for serial, parallel, distributed, and grid-based financ...
详细信息
ISBN:
(纸本)9781424416943
The proceedings contain 461 papers. The topics discussed include: how to make discretionary access control secure against trojan horses;random number generation for serial, parallel, distributed, and grid-based financial computations;mobility control schemes with quick convergence in wireless sensor networks;design and implementation of a tool for modeling and programming deadlock free meta-pipeline applications;analytic performance models for bounded queuing systems;on the construction of paired many-to-many disjoint path covers in hypercube-like interconnection networks with faulty elements;a scalable configurable architecture for the massively parallel GCA model;state management for distributed python applications;a fault-tolerant system for Java/CORBA objects;and improving data availability for a cluster file system through replication.
The proceedings contain 362 papers. The topics discussed include: uniform scattering of autonomous mobile robots in a grid;performance study of interference on sharing GPU and CPU resources with multiple applications;...
ISBN:
(纸本)9781424437504
The proceedings contain 362 papers. The topics discussed include: uniform scattering of autonomous mobile robots in a grid;performance study of interference on sharing GPU and CPU resources with multiple applications;resource allocation strategies for constructive in-network stream processing;deciding model of population size in time-constrained task scheduling;improving accuracy of host load predictions on computational grids by artificial neural networks;combining multiple heuristics on discrete resources;predictive analysis and optimization of pipelined wavefront computations;RSA encryption and decryption using the redundant number system on the FPGA;computation with a constant number of steps in membrane computing;analytical model of inter-node communication under multi-versioned coherence mechanisms;and a distributed approach for the problem of routing and wavelength assignment in WDM networks.
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machin...
ISBN:
(纸本)9781665435772
The proceedings contain 117 papers. The topics discussed include: resource elasticity at task-level;evaluation of vertex reordering for graph applications;on the predictability of quantum circuit fidelity using machine learning;improving the operational capability of automated empirical performance modeling;development of a middleware to create an efficient unified programming model for heterogeneous computing;task-level checkpointing for nested fork-join programs;verifiable coded computing: towards fast and secure distributed computing;hierarchical cost analysis for distributed deep learning;pattern-aware vectorization for sparse matrix computations;and heterogeneity-aware deep learning workload deployments on the computing continuum.
暂无评论