The proceedings contain 25 papers. The topics discussed include: FlightVGM: efficient video generation model inference with online sparsification and hybrid precision on FPGAs;TreeLUT: an efficient alternative to deep...
ISBN:
(纸本)9798400713965
The proceedings contain 25 papers. The topics discussed include: FlightVGM: efficient video generation model inference with online sparsification and hybrid precision on FPGAs;TreeLUT: an efficient alternative to deep neural networks for inference acceleration using gradient boosted decision trees;greater than the sum of its LUTs: scaling up LUT-based neural networks with AmigoLUT;wa-hls4ml and lui-gnn: a benchmark and GNN based surrogate model for hls4ml resource and latency estimation;InTRRA: inter-task resource-repurposing accelerator for efficient transformer inference on FPGAs;DPUV4E: high-throughput DPU architecture design for CNN on versal ACAP;and performance analysis of GEMM workloads on the AMD versal platform.
The proceedings contain 23 papers. The topics discuss include: CompressedLUT: an open-source tool for lossless compression of lookup tables for function evaluation and beyond;MiCache: an MSHR-inclusive non-blocking ca...
ISBN:
(纸本)9798400704185
The proceedings contain 23 papers. The topics discuss include: CompressedLUT: an open-source tool for lossless compression of lookup tables for function evaluation and beyond;MiCache: an MSHR-inclusive non-blocking cache design for FPGAs;Hardcaml MSM: a high-performance split CPU-FPGA multi-scalar multiplication engine;DynaRapid: from C to FPGA in a few seconds;design and implementation of a primary visual cortex pathway model based on opponent-process theory;Hardcaml: an OCaml hardware domain-specific language for efficient and robust design;XUNI: virtual machine abstraction for self-contained and multi-tenant cloud FPGAs;ISO-TENANT: rethinking FPGA power distribution network (PDN): a hardware based solution for remote power side channel attacks in FPGA;and accelerating autonomous path planning on FPGAs with sparsity-aware HW/SW co-optimizations.
The proceedings contain 23 papers. The topics discussed include: eliminating excessive dynamism of dataflow circuits using model checking;straight to the queue: fast load-store queue allocation in dataflow circuits;OM...
ISBN:
(纸本)9781450394178
The proceedings contain 23 papers. The topics discussed include: eliminating excessive dynamism of dataflow circuits using model checking;straight to the queue: fast load-store queue allocation in dataflow circuits;OMT: a demand-adaptive, hardware-targeted Bonsai Merkle tree framework for embedded heterogeneous memory platform;fault detection on multi COTS FPGA systems for physics experiments on the international space station;Nimblock: scheduling for fine-grained FPGA sharing through virtualization;weave: abstraction for accelerator integration of generated modules;a novel FPGA simulator accelerating reinforcement learning-based design of power converters;and power side-channel countermeasures for ARX ciphers using high-level synthesis.
The proceedings contain 18 papers. The topics discussed include: multi-input serial adders for FPGA-like computational fabric;logic scaling options for the next 10 years: from FinFet to CFET, from dual damascene to se...
ISBN:
(纸本)9781450391498
The proceedings contain 18 papers. The topics discussed include: multi-input serial adders for FPGA-like computational fabric;logic scaling options for the next 10 years: from FinFet to CFET, from dual damascene to semi damascene;a high throughput multi-bit-width 3D systolic accelerator for NAS optimized deep neural networks on FPGA;automated accelerator optimization aided by graph neural networks;hardware acceleration of nonparametric belief propagation for efficient robot manipulation;HMT: a hardware-centric hybrid bonsai Merkle tree algorithm for high-performance authentication;synthesized garbage collection for FPGA accelerators;and SEXTANS: a streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication.
The proceedings contain 26 papers. The topics discussed include: are we alone? searching for ET with FPGAs;tensor slices to the rescue: supercharging ML acceleration on FPGAs;global is the new local: FPGA architecture...
ISBN:
(纸本)9781450382182
The proceedings contain 26 papers. The topics discussed include: are we alone? searching for ET with FPGAs;tensor slices to the rescue: supercharging ML acceleration on FPGAs;global is the new local: FPGA architecture at 5nm and beyond;Stratix 10 NX architecture and applications;ThunderGP: HLS-based graph processing framework on FPGAs;AutoBridge: coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs;AutoSA: a polyhedral compiler for high-performance systolic arrays on FPGA;demystifying the memory system of modern datacenter FPGAs for software programmers through microbenchmarking;PRGA: an open-source FPGA research and prototyping framework;and PRGA: An open-source FPGA research and prototyping framework.
The proceedings contain 33 papers. The topics discussed include: flexible communication avoiding matrix multiplication on FPGA with high-level synthesis;maximizing the serviceability of partially reconfigurable FPGA s...
ISBN:
(纸本)9781450370998
The proceedings contain 33 papers. The topics discussed include: flexible communication avoiding matrix multiplication on FPGA with high-level synthesis;maximizing the serviceability of partially reconfigurable FPGA systems in multi-tenant environment;fingerprinting cloud FPGA infrastructures;massively simulating adiabatic bifurcations with FPGA to solve combinatorial optimization;high-performance FPGA network switch architecture;using OPENCL to enable software-like development of an FPGA-accelerated biophotonic cancer treatment simulator;energy-efficient 360-degree video rendering on FPGA via algorithm-architecture co-design;real-time spatial 3D audio synthesis on FPGAS for blind sailing;when massive GPU parallelism ain't enough: a novel hardware architecture of 2D-LSTM neural network;and light-OPU: an FPGA-based overlay processor for lightweight convolutional neural networks.
The proceedings contain 35 papers. The topics discussed include: visual system integrator;build your own domain-specific solutions with RapidWright;reconfigurable convolutional kernels for neural networks on FPGAs;eff...
ISBN:
(纸本)9781450361378
The proceedings contain 35 papers. The topics discussed include: visual system integrator;build your own domain-specific solutions with RapidWright;reconfigurable convolutional kernels for neural networks on FPGAs;efficient and effective sparse LSTM on FPGA with bank-balanced sparsity;math doesn't have to be hard: logic block architectures to enhance low-precision multiply-accumulate on FPGAs;on-chip FPGA debug instrumentation for machine learning applications;scheduling data in neural network applications;fault testing a synthesizable embedded processor at gate level using ultrascale FPGA emulation;a deep-reinforcement-learning-based scheduler for high-level synthesis;accelerating 3D CNN-based lung nodule segmentation on a multi-FPGA system;SparseBNN: joint algorithm/hardware optimization to exploit structured sparsity in binary neural network;a deep learning inference accelerator based on model compression on FPGA;and sparse winograd convolutional neural networks on small-scale systolic arrays.
The proceedings contain 31 papers. The topics discussed include: memory-efficient fast Fourier transform on streaming data by fusing permutations;DeltaRNN: a power-efficient recurrent neural network accelerator;degree...
ISBN:
(纸本)9781450356145
The proceedings contain 31 papers. The topics discussed include: memory-efficient fast Fourier transform on streaming data by fusing permutations;DeltaRNN: a power-efficient recurrent neural network accelerator;degree-aware hybrid graph traversal on FPGA-HMC platform;architecture exploration for HLS-oriented FPGA debug overlays;graph-theoretically optimal memory banking for stencil-based computing kernels;ADAM: automated design analysis and merging for speeding up FPGA development;high-performance QR decomposition for FPGAs;a HOG-based real-time and multi-scale pedestrian detector demonstration system on FPGA;combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL;P4-compatible high-level synthesis of low latency 100 Gb/s streaming packet parsers in FPGAs;a scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation;dynamically scheduled high-level synthesis;and a customizable matrix multiplication framework for the Intel HARPv2 Xeon+FPGA platform.
This paper presents the implementation of Manticore: a manycore accelerator for parallel RTL simulation. Manticore packs up to 225 custom soft processors running at 475 MHz on a large FPGA. Implementing manycore accel...
详细信息
Versatile Place and Route (VPR) enabled the exploration of diverse FPGA architectures. OpenFPGA extended VPR through bitstream generation and added silicon compilation. Our work is one of several efforts to use this c...
详细信息
暂无评论