For loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nestedloops represent an important source of parallelism. Existing solutions to mappingnestedloops on CGRAs, however, a...
详细信息
ISBN:
(纸本)9781450330510
For loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nestedloops represent an important source of parallelism. Existing solutions to mappingnestedloops on CGRAs, however, are either designed for perfectly nestedloops only, or expensive and inflexible. Efficient CGRA mapping of imperfect loops with arbitrary nesting depth still remains a challenge. In this paper we propose a compiler-hardware co-operative approach that is flexible and yet able to generate efficient mappings for imperfect nestedloops. It is based on loop flattening, but to mitigate the negative impact of flattening we combine loop fission and a light-weight architecture extension that is designed to accelerate common operation patterns appearing frequently in flattened loops. Our experimental results using imperfect loops from multimedia and DSP domains demonstrate that our special operations can cover a large portion of nestedloop operations, improve performance of nestedloops by nearly 30% over using loop flattening only, and achieve near-ideal executions on CGRAs for imperfect loops.
暂无评论