版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:National University of Defense Technology College of Computer Science and Technology Changsha410073 China Shaanxi Normal University Xi’an710069 China
出 版 物:《IEEE Transactions on Computers》 (IEEE Trans Comput)
年 卷 期:2025年第74卷第6期
页 面:1829-1843页
核心收录:
学科分类:0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFB4501702 and in part by the National Natural Science Foundation of China (NSFC) under Grant U24B20151 Grant 61972408 and Grant 61872294
主 题:Convolution
摘 要:Convolution kernels are widely seen in high-performance computing (HPC) and deep learning (DL) workloads and are often responsible for performance bottlenecks. Prior works have demonstrated that the direct convolution approach can outperform the conventional convolution implementation. Although well-studied, the existing approaches for direct convolution are either incompatible with the mainstream DL data layouts or lead to suboptimal performance. We design nDirect2, a novel direct convolution approach that targets multi-core CPUs commonly found in smartphones and HPC systems. nDirect2 is compatible with the data layout formats used by mainstream DL frameworks and offers new optimizations for the computational kernel, data packing, advanced operator fusion, and parallelization. We evaluate nDirect2 by applying it to representative convolution kernels and demonstrating how well it performs on four distinct ARM-based CPUs and an X86-based CPU. Experimental results show that nDirect2 outperforms four state-of-the-art convolution approaches across most evaluation cases and hardware architectures. © 1968-2012 IEEE.