The increasing complexity of large-scale FPGA accelerators poses significant challenges in achieving high performance while maintaining design productivity. High-level synthesis (HLS) has been adopted as a solution, b...
详细信息
In FPGAs, high communication latency in multi-die chips has driven the integration of hardened networks-on-chip (NoCs) in commercial devices. However, for programming FPGAs with high-level synthesis (HLS), existing to...
详细信息
ISBN:
(数字)9798331502812
ISBN:
(纸本)9798331502829
In FPGAs, high communication latency in multi-die chips has driven the integration of hardened networks-on-chip (NoCs) in commercial devices. However, for programming FPGAs with high-level synthesis (HLS), existing tools only provide low-level cumbersome abstractions, and only work for offloading memory accesses. Furthermore, these abstractions remain inaccessible to programmers due to their reliance on placement knowledge. While automatically leveraging the NoC without manual intervention is ideal, it poses several challenges: 1. Managing the trade-off in resource utilization between the hard NoC and the Programmable Logic (PL). 2. Allocating limited hard NoC resources between different communication in the designs. 3. Aligning hard NoC and PL placement even though the actual PL placement cannot be determined beforehand. We address these challenges by developing NoH, the first HLS flow that automates hard NoC offloading. First, we develop a formal NoC-aware placement algorithm that leverages integer linear programming (ILP) and considers the first two challenges for offloading external memory accesses and latency-insensitive communication between modules. Then, we arrange the ports synergistically with PL modules via a port-affinity model that approximates the PL placement. Finally, NoH is integrated into an end-to-end HLS flow and evaluated on 4 workloads with diverse communication patterns. NoH gains 20% FPGA frequency over AMD tools by leveraging the hard NoC. Compared to AutoBridge [1], a recent high-level physical synthesis technique that optimizes frequency but does not consider the hard NoC, NoH never fails place-and-route by offloading inter-die crossings (AutoBridge fails in 31% of workload configurations tested) and is faster (6%) for the rest.
The increasing complexity of large-scale FPGA accelerators poses signi!cant challenges in achieving high performance while maintaining design productivity. High-level synthesis (HLS) has been adopted as a solution, bu...
详细信息
暂无评论