Memory partitioning has been widely adopted to increase the memory bandwidth. data reuse is a hardware-efficient way to improve dataaccess throughput by exploiting locality in memory access patterns. We found that fi...
详细信息
ISBN:
(纸本)9781450359504
Memory partitioning has been widely adopted to increase the memory bandwidth. data reuse is a hardware-efficient way to improve dataaccess throughput by exploiting locality in memory access patterns. We found that fitr many applications in image and video processing, a global data reuse scheme can be shared by multiple patterns. In this paper, we propose an efficient data reuse strategy for multi-pattern dataaccess. Firstly, a heuristic algorithm is proposed to extract the reuse information as well as find the non-reusable data elements of each pattern. Then the non-reusable elements are partitioned into several memory banks by an efficient memory partitioning algorithm. Moreover, the reuse information is utilized to generate the global data reuse logic shared by the multi-pattern. We design a novel algorithm to minimize the number of registers required by the data reuse logic. Experimental results show that compared with the state-of-the-art approach, our proposed method can reduce the number of required BRAMs by 62.2% on average, with the average reduction of 82.1% in SLICE, 87.1% in LUTs, 71.6% in Flip-Flops, 73.170 in DSP48Es, 83.8% in SRLs, 46.77 in storage overhead, 79.1% in dynamic power consumption, and 82.6% in execution time of memory partitioning. Besides, the performance is improved by 14.4%.
Multimedia applications commonly require high computation power mostly in conjunction with high data throughput. As an additional challenge, such applications are increasingly used in handheld devices, where also smal...
详细信息
ISBN:
(纸本)0819429872
Multimedia applications commonly require high computation power mostly in conjunction with high data throughput. As an additional challenge, such applications are increasingly used in handheld devices, where also small package outlines and low power aspects are important. Many research approaches have shown, that accelerators based on reconfigurable hardware can satisfy those performance demands. Most of these approaches use commercial fine-grained FPGAs to implement reconfigurable accelerators. However, it has shown, that these devices are not always well suited for reconfigurable computing. The drawbacks here are the area-inefficiency and the insufficiency of the available design-tools. Besides the fine-grained FPGAs, coarse-grained reconfigurable architectures have been developed, which are more area efficient and better suited for computational purposes. In this paper, an implementation of such an architecture, the KressArray, is introduced and its use in the Map-oriented Machine with parallel data access (MoM-PDA) is shown. The MoM-PDA is an FPGA-based custom computing machine, which is able to perform concurrent memory accesses by means of a dedicated memory organization scheme. The benefits of this architecture are illustrated by an application example.
暂无评论