咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >nDirect2: A High-Performance L... 收藏

nDirect2: A High-Performance Library for Direct Convolutions on Multicore CPUs

作     者:Yang, Weiling Wang, Pengyu Fang, Jianbin Dong, Dezun Pang, Zhengbin He, Runxi Zhang, Peng Tang, Tao Huang, Chun Che, Yonggang Ren, Jie 

作者机构:National University of Defense Technology College of Computer Science and Technology Changsha410073 China Shaanxi Normal University Xi’an710069 China 

出 版 物:《IEEE Transactions on Computers》 (IEEE Trans Comput)

年 卷 期:2025年第74卷第6期

页      面:1829-1843页

核心收录:

学科分类:0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFB4501702 and in part by the National Natural Science Foundation of China (NSFC) under Grant U24B20151  Grant 61972408  and Grant 61872294 

主  题:Convolution 

摘      要:Convolution kernels are widely seen in high-performance computing (HPC) and deep learning (DL) workloads and are often responsible for performance bottlenecks. Prior works have demonstrated that the direct convolution approach can outperform the conventional convolution implementation. Although well-studied, the existing approaches for direct convolution are either incompatible with the mainstream DL data layouts or lead to suboptimal performance. We design nDirect2, a novel direct convolution approach that targets multi-core CPUs commonly found in smartphones and HPC systems. nDirect2 is compatible with the data layout formats used by mainstream DL frameworks and offers new optimizations for the computational kernel, data packing, advanced operator fusion, and parallelization. We evaluate nDirect2 by applying it to representative convolution kernels and demonstrating how well it performs on four distinct ARM-based CPUs and an X86-based CPU. Experimental results show that nDirect2 outperforms four state-of-the-art convolution approaches across most evaluation cases and hardware architectures. © 1968-2012 IEEE.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分