The proceedings contain 25 papers. The topics discussed include: FlightVGM: efficient video generation model inference with online sparsification and hybrid precision on fpgas;TreeLUT: an efficient alternative to deep...
ISBN:
(纸本)9798400713965
The proceedings contain 25 papers. The topics discussed include: FlightVGM: efficient video generation model inference with online sparsification and hybrid precision on fpgas;TreeLUT: an efficient alternative to deep neural networks for inference acceleration using gradient boosted decision trees;greater than the sum of its LUTs: scaling up LUT-based neural networks with AmigoLUT;wa-hls4ml and lui-gnn: a benchmark and GNN based surrogate model for hls4ml resource and latency estimation;InTRRA: inter-task resource-repurposing accelerator for efficient transformer inference on fpgas;DPUV4E: high-throughput DPU architecture design for CNN on versal ACAP;and performance analysis of GEMM workloads on the AMD versal platform.
The proceedings contain 23 papers. The topics discuss include: CompressedLUT: an open-source tool for lossless compression of lookup tables for function evaluation and beyond;MiCache: an MSHR-inclusive non-blocking ca...
ISBN:
(纸本)9798400704185
The proceedings contain 23 papers. The topics discuss include: CompressedLUT: an open-source tool for lossless compression of lookup tables for function evaluation and beyond;MiCache: an MSHR-inclusive non-blocking cache design for fpgas;Hardcaml MSM: a high-performance split CPU-fpga multi-scalar multiplication engine;DynaRapid: from C to fpga in a few seconds;design and implementation of a primary visual cortex pathway model based on opponent-process theory;Hardcaml: an OCaml hardware domain-specific language for efficient and robust design;XUNI: virtual machine abstraction for self-contained and multi-tenant cloud fpgas;ISO-TENANT: rethinking fpga power distribution network (PDN): a hardware based solution for remote power side channel attacks in fpga;and accelerating autonomous path planning on fpgas with sparsity-aware HW/SW co-optimizations.
The proceedings contain 23 papers. The topics discussed include: eliminating excessive dynamism of dataflow circuits using model checking;straight to the queue: fast load-store queue allocation in dataflow circuits;OM...
ISBN:
(纸本)9781450394178
The proceedings contain 23 papers. The topics discussed include: eliminating excessive dynamism of dataflow circuits using model checking;straight to the queue: fast load-store queue allocation in dataflow circuits;OMT: a demand-adaptive, hardware-targeted Bonsai Merkle tree framework for embedded heterogeneous memory platform;fault detection on multi COTS fpga systems for physics experiments on the international space station;Nimblock: scheduling for fine-grained fpga sharing through virtualization;weave: abstraction for accelerator integration of generated modules;a novel fpga simulator accelerating reinforcement learning-based design of power converters;and power side-channel countermeasures for ARX ciphers using high-level synthesis.
The proceedings contain 18 papers. The topics discussed include: multi-input serial adders for fpga-like computational fabric;logic scaling options for the next 10 years: from FinFet to CFET, from dual damascene to se...
ISBN:
(纸本)9781450391498
The proceedings contain 18 papers. The topics discussed include: multi-input serial adders for fpga-like computational fabric;logic scaling options for the next 10 years: from FinFet to CFET, from dual damascene to semi damascene;a high throughput multi-bit-width 3D systolic accelerator for NAS optimized deep neural networks on fpga;automated accelerator optimization aided by graph neural networks;hardware acceleration of nonparametric belief propagation for efficient robot manipulation;HMT: a hardware-centric hybrid bonsai Merkle tree algorithm for high-performance authentication;synthesized garbage collection for fpga accelerators;and SEXTANS: a streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication.
The proceedings contain 26 papers. The topics discussed include: are we alone? searching for ET with fpgas;tensor slices to the rescue: supercharging ML acceleration on fpgas;global is the new local: fpga architecture...
ISBN:
(纸本)9781450382182
The proceedings contain 26 papers. The topics discussed include: are we alone? searching for ET with fpgas;tensor slices to the rescue: supercharging ML acceleration on fpgas;global is the new local: fpga architecture at 5nm and beyond;Stratix 10 NX architecture and applications;ThunderGP: HLS-based graph processing framework on fpgas;AutoBridge: coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die fpgas;AutoSA: a polyhedral compiler for high-performance systolic arrays on fpga;demystifying the memory system of modern datacenter fpgas for software programmers through microbenchmarking;PRGA: an open-source fpga research and prototyping framework;and PRGA: An open-source fpga research and prototyping framework.
The proceedings contain 33 papers. The topics discussed include: flexible communication avoiding matrix multiplication on fpga with high-level synthesis;maximizing the serviceability of partially reconfigurable fpga s...
ISBN:
(纸本)9781450370998
The proceedings contain 33 papers. The topics discussed include: flexible communication avoiding matrix multiplication on fpga with high-level synthesis;maximizing the serviceability of partially reconfigurable fpga systems in multi-tenant environment;fingerprinting cloud fpga infrastructures;massively simulating adiabatic bifurcations with fpga to solve combinatorial optimization;high-performance fpga network switch architecture;using OPENCL to enable software-like development of an fpga-accelerated biophotonic cancer treatment simulator;energy-efficient 360-degree video rendering on fpga via algorithm-architecture co-design;real-time spatial 3D audio synthesis on fpgaS for blind sailing;when massive GPU parallelism ain't enough: a novel hardware architecture of 2D-LSTM neural network;and light-OPU: an fpga-based overlay processor for lightweight convolutional neural networks.
The proceedings contain 35 papers. The topics discussed include: visual system integrator;build your own domain-specific solutions with RapidWright;reconfigurable convolutional kernels for neural networks on fpgas;eff...
ISBN:
(纸本)9781450361378
The proceedings contain 35 papers. The topics discussed include: visual system integrator;build your own domain-specific solutions with RapidWright;reconfigurable convolutional kernels for neural networks on fpgas;efficient and effective sparse LSTM on fpga with bank-balanced sparsity;math doesn't have to be hard: logic block architectures to enhance low-precision multiply-accumulate on fpgas;on-chip fpga debug instrumentation for machine learning applications;scheduling data in neural network applications;fault testing a synthesizable embedded processor at gate level using ultrascale fpga emulation;a deep-reinforcement-learning-based scheduler for high-level synthesis;accelerating 3D CNN-based lung nodule segmentation on a multi-fpga system;SparseBNN: joint algorithm/hardware optimization to exploit structured sparsity in binary neural network;a deep learning inference accelerator based on model compression on fpga;and sparse winograd convolutional neural networks on small-scale systolic arrays.
The proceedings contain 31 papers. The topics discussed include: memory-efficient fast Fourier transform on streaming data by fusing permutations;DeltaRNN: a power-efficient recurrent neural network accelerator;degree...
ISBN:
(纸本)9781450356145
The proceedings contain 31 papers. The topics discussed include: memory-efficient fast Fourier transform on streaming data by fusing permutations;DeltaRNN: a power-efficient recurrent neural network accelerator;degree-aware hybrid graph traversal on fpga-HMC platform;architecture exploration for HLS-oriented fpga debug overlays;graph-theoretically optimal memory banking for stencil-based computing kernels;ADAM: automated design analysis and merging for speeding up fpga development;high-performance QR decomposition for fpgas;a HOG-based real-time and multi-scale pedestrian detector demonstration system on fpga;combined spatial and temporal blocking for high-performance stencil computation on fpgas using OpenCL;P4-compatible high-level synthesis of low latency 100 Gb/s streaming packet parsers in fpgas;a scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation;dynamically scheduled high-level synthesis;and a customizable matrix multiplication framework for the Intel HARPv2 Xeon+fpga platform.
In this paper, the VHDL implementation of a 1D convolutional neural network (CNN) intrapulse modulation classifier is discussed. The solution is designed for electronic warfare (EW) applications, whose main purpose is...
详细信息
This paper presents the implementation of Manticore: a manycore accelerator for parallel RTL simulation. Manticore packs up to 225 custom soft processors running at 475 MHz on a large fpga. Implementing manycore accel...
详细信息
暂无评论