版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Jilin Univ Coll Comp Sci & Technol Changchun 13002 Jilin Peoples R China Zhuhai Coll Jilin Univ Dept Comp Sci & Technol Zhuhai Lab Key Lab Symbol Computat & Knowledge En Minist Educ Zhuhai 519041 Peoples R China Univ Aberdeen Dept Comp Sci Aberdeen AB24 3UE Scotland
出 版 物:《IEEE ACCESS》 (IEEE Access)
年 卷 期:2018年第6卷
页 面:72327-72344页
核心收录:
基 金:National Natural Science Foundation of China [61472159, 61572227, 61772227] Development Project of Jilin Province of China [20160204022GX, 20170101006JC, 20170203002GX, 2017C030-1, 2017C033] Premier-Discipline Enhancement Scheme through Zhuhai Government Premier Key-Discipline Enhancement Scheme through the Guangdong Government Funds Jilin Provincial Key Laboratory of Big Data Intelligent Computing [20180622002JC]
主 题:Field programmable gate arrays multicore processing parallel programming particle swarm optimization pipeline processing
摘 要:Swarm intelligence algorithms (SIAs) have demonstrated excellent performance when solving optimization problems including many real-world problems. However, because of their expensive computational cost for some complex problems, SIAs need to be accelerated effectively for better performance. This paper presents a high-performance general framework to accelerate SIAs (FASI). Different from the previous work which accelerates SIAs through enhancing the parallelization only, FASI considers both the memory architectures of hardware platforms and the dataflow of SIAs, and it reschedules the framework of SIAs as a converged dataflow to improve the memory access efficiency. FASI achieves higher acceleration ability by matching the algorithm framework to the hardware architectures. We also design deep optimized structures of the parallelization and convergence of FASI based on the characteristics of specific hardware platforms. We take the quantum behaved particle swarm optimization algorithm as a case to evaluate FASI. The results show that FASI improves the throughput of SIAs and provides better performance through optimizing the hardware implementations. In our experiments, FASI achieves a maximum of 290.7 Mb/s throughput which is higher than several existing systems, and FASI on FPGAs achieves a better speedup than that on GPUs and multi-core CPUs. FASI is up to 123 times and not less than 1.45 times faster in terms of optimization time on Xilinx Kintex Ultrascale xcku040 when compares to Intel Core i7-6700 CPU/ NVIDIA GTX1080 GPU. Finally, we compare the differences of deploying FASI on hardware platforms and provide some guidelines for promoting the acceleration performance according to the hardware architectures.