The proceedings contain 21 papers. The topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural...
ISBN:
(纸本)9798350356168
The proceedings contain 21 papers. The topics discussed include: S-Clflush: securing against flush-based cache timing side-channel attacks;TangramFP: energy-efficient, bit-parallel, multiply-accumulate for deep neural networks;analyzing HPC monitoring data with a view towards efficient resource utilization;DYAD: locality-aware data management for accelerating deep learning training;IDS-DEEP: a strategy for selecting the best IDS for drones with heterogeneous embedded platforms;memory sandbox: a versatile tool for analyzing and optimizing HBM performance in FPGA;DeVAS: decoupled virtual address spaces;towards performance portability of an oil and gas application on heterogeneous architectures;and JANUS: a simple and efficient speculative defense using reinforcement learning.
The proceedings contain 32 papers. The topics discussed include: accelerating solution of generalized linear models by solving normal equation using GPGPU on a large real-world tall-skinny data set;performance analysi...
ISBN:
(纸本)9781728141947
The proceedings contain 32 papers. The topics discussed include: accelerating solution of generalized linear models by solving normal equation using GPGPU on a large real-world tall-skinny data set;performance analysis and optimization of automotive GPUs;non-uniform partitioning for collaborative execution on heterogeneous architectures;efficiency and scalability of multi-lane capsule networks (MLCN);towards a transprecision polymorphic floating-point unit for mixed-precision computing;and Monte-Carlo tree search and reinforcement learning for reconfiguring data stream processing on edge computing.
The proceedings contain 21 papers. The topics discussed include: a low-power hardware accelerator for ORB feature extraction in self-driving cars;improving phased transactional memory via commit throughput and capacit...
ISBN:
(纸本)9781665443012
The proceedings contain 21 papers. The topics discussed include: a low-power hardware accelerator for ORB feature extraction in self-driving cars;improving phased transactional memory via commit throughput and capacity estimation;design and evaluation of associative processing kernels;a task-based execution engine for distributed operating systems tailored to lightweight manycores with limited on-chip memory;sparsity-aware power gating for tensor cores;employing simulation to facilitate the design of dynamic binary translators;register flush-free runahead execution for modern vector processors;and shelf schedules for independent moldable tasks to minimize the energy consumption.
The proceedings contain 45 papers. The topics discussed include: Denseflex: a low rank factorization methodology for adaptable dense layers in DNNs;a lightweight architecture for real-time neuronal-spike classificatio...
ISBN:
(纸本)9798400705977
The proceedings contain 45 papers. The topics discussed include: Denseflex: a low rank factorization methodology for adaptable dense layers in DNNs;a lightweight architecture for real-time neuronal-spike classification;PEARL: enabling portable, productive, and high-performance deep reinforcement learning using heterogeneous platforms;mini-batching with fused training and testing for data streams processing on the edge;energy-aware IoT deployment planning;register blocking: an analytical modelling approach for affine loop kernels;hardware support for balanced co-execution in heterogeneous processors;HLS taking flight: toward using high-level synthesis techniques in a space-borne instrument;an ANN-guided multi-objective framework for power-performance balancing in HPC systems;and clustering and allocation of spiking neural networks on crossbar-based neuromorphic architecture.
The IARPA AGILE program is a program designed to create an entirely new highperformancecomputingarchitecture for data-intensive computing applications. More specifically, the objective of the AGILE initiative is to...
详细信息
Bosonic quantum computing, based on the infinite-dimensional qumodes, has shown promise for various practical applications that are classically hard. However, the lack of compiler optimizations has hindered its full p...
详细信息
ISBN:
(纸本)9798350326598;9798350326581
Bosonic quantum computing, based on the infinite-dimensional qumodes, has shown promise for various practical applications that are classically hard. However, the lack of compiler optimizations has hindered its full potential. This paper introduces Bosehedral, an efficient compiler optimization framework for (Gaussian) Boson sampling on Bosonic quantum hardware. Bosehedral overcomes the challenge of handling infinite-dimensional qumode gate matrices by performing all its program analysis and optimizations at a higher algorithmic level, using a compact unitary matrix representation. It optimizes qumode gate decomposition and logical-to-physical qumode mapping, and introduces a tunable probabilistic gate dropout method. Overall, Bosehedral significantly improves the performance by accurately approximating the original program with much fewer gates. Our evaluation shows that Bosehedral can largely reduce the program size but still maintain a high approximation fidelity, which can translate to significant end-to-end application performance improvement.
The proceedings contain 87 papers. The topics discussed include: a SAT scalpel for lattice surgery: representation and synthesis of subroutines for surface-code fault-tolerant quantum computing;circular reconfigurable...
ISBN:
(纸本)9798350326581
The proceedings contain 87 papers. The topics discussed include: a SAT scalpel for lattice surgery: representation and synthesis of subroutines for surface-code fault-tolerant quantum computing;circular reconfigurable parallel processor for edge computing;MegIS: high-performance, energy-efficient, and low-cost metagenomic analysis with in-storage processing;TCP: a tensor contraction processor for AI workloads;DyLeCT: achieving huge-page-like translation performance for hardware-compressed memory;ElasticRec: a microservice-based model serving architecture enabling elastic resource scaling for recommendation models;waferscale network switches;Triangel: a high-performance, accurate, timely on-chip temporal prefetcher;compiler-directed whole-system persistence;and the dataflow abstract machine simulator framework.
In high-performancecomputing (HPC), multi-threaded applications using OpenMP face complex challenges in identifying hidden performance issues, often due to resource conflicts, software inefficiencies, and hardware an...
详细信息
ISBN:
(纸本)9783031814037;9783031814044
In high-performancecomputing (HPC), multi-threaded applications using OpenMP face complex challenges in identifying hidden performance issues, often due to resource conflicts, software inefficiencies, and hardware anomalies. These subtle issues can significantly degrade performance and reduce system reliability. This paper introduces an innovative approach designed to address these concealed issues in OpenMP multi-threaded applications. The proposed method integrates a Random Forest classifier with anthropomorphic diagnosis to effectively identify and diagnose performance-affecting problems. The approach has demonstrated a remarkable ability to detect 90% of performance-affecting issues that are often obscured within complex HPC environments.
The proceedings contain 118 papers. The topics discussed include: ChameleonEC: exploiting tunability of erasure coding for low-interference repair;DPUaudit: DPU-assisted pull-based architecture for near-zero cost syst...
ISBN:
(纸本)9798331506476
The proceedings contain 118 papers. The topics discussed include: ChameleonEC: exploiting tunability of erasure coding for low-interference repair;DPUaudit: DPU-assisted pull-based architecture for near-zero cost system auditing;delinquent loop pre-execution using predicated helper threads;architecting value prediction around in-order execution;efficient optimization with encoded Ising models;LegoZK: a dynamically reconfigurable accelerator for zero-knowledge proof;reuse-aware compilation for zoned quantum architectures based on neutral atoms;HATT: Hamiltonian adaptive ternary tree for optimizing fermion-to-qubit mapping;QuCLEAR: Clifford extraction and absorption for quantum circuit optimization;and gaze into the pattern: characterizing spatial patterns with internal temporal correlations for hardware prefetching.
作者:
Cabello, Julia GarciaCarbo-Garcia, S.Univ Granada
Andalusian Res Inst Data Sci & Computat Intellige Dept Appl Math Granada Spain Univ Granada
Andalusian Res Inst Data Sci & Computat Intellige Dept Comp Sci & Artificial Intelligence Granada Spain
In recent years, the architecture and structure of Deep Neural Networks (DNNs) have become progressively more complex in order to respond to the increasing complexity of real problems. A strategy to deal with this com...
详细信息
ISBN:
(纸本)9783031820724;9783031820731
In recent years, the architecture and structure of Deep Neural Networks (DNNs) have become progressively more complex in order to respond to the increasing complexity of real problems. A strategy to deal with this complexity when it affects training would be to partition DNN training in some way: for example, by distributing it among different components of a computer network. For this, training (which is in essence the minimization of the loss function) should be performed through separated "smaller pieces". This paper offers an alternative to the gradient-based DNN training from a Dynamic Programming (DP) point of view (DP is an optimisation methodology supported by the division of a complex problem into many problems of lower complexity). To do so, conditions which enable the DNN minimization algorithm to be solved under a DP perspective are studied here. In this line, in this work is proved that any artificial neural network ANN (and thus also DNNs) with monotonic activation is separable. Furthermore, whenever ANNs are considered as a dynamical system in the form of a network (known as coupled cell networks CCNs), we show that the transmission function is a separable function assuming that the activation is non-decreasing.
暂无评论