Dataflow architectures are considered promising architecture, offering a commendable balance of performance, efficiency, and flexibility. Abundant prior works have been proposed to improve the performance of dataflow ...
详细信息
Dataflow architectures are considered promising architecture, offering a commendable balance of performance, efficiency, and flexibility. Abundant prior works have been proposed to improve the performance of dataflow architectures. Nevertheless, these solutions can be further improved due to the lack of efficient data prefetching and flexible task scheduling. In this paper, we propose a novel dataflow architecture with adaptive prefetching and decentralized scheduling (PANDA). Firstly, we present an application-adaptive data prefetching method and on-chipmemory microarchitecture designed to overlap memory access latency. Secondly, we introduce a decentralized dataflow scheduling approach and processing element (PE) microarchitecture aimed at improving hardware utilization. Experimental results show that in a wide range of real-world applications, PANDA attains up to 2.53 × performance improvement and 1.79 × energy efficiency improvement over the state-of-the-art dataflow architectures.
暂无评论