Tasking is a prominent parallel programming model. In this paper we conduct a first study into the feasibility of task-parallel execution at the CUDA grid, rather than the stream/kernel level, for regular, fixed in-ou...
详细信息
ISBN:
(纸本)9781479989379
Tasking is a prominent parallel programming model. In this paper we conduct a first study into the feasibility of task-parallel execution at the CUDA grid, rather than the stream/kernel level, for regular, fixed in-out dependency task graphs, similar to those found in wavefront computational patterns, making the findings broadly applicable. We propose and evaluate three CUDA task progression algorithms, where threadblocks cooperatively process the task graph, and argue about their performance in terms of tasking throughput, atomics and memory IO overheads. Our initial results demonstrate a throughput of 38 million tasks/second on a Kepler K20X architecture.
暂无评论