检索结果-内蒙古大学图书馆

Dynamic FPGA reconfiguration for scalable embedded artificial intelligence (AI): A co-design methodology for convolutional neural networks (CNN) acceleration

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2025年 169卷

作者： Boudjadar, Jalil Ul Islam, Saif Buyya, Rajkumar Aarhus Univ Dept Elect & Comp Engn Software Engn & Comp Syst DK-8200 Aarhus Denmark Univ Warwick WMG Coventry CV4 7AL England Univ Melbourne Sch Comp & Informat Syst Quantum Cloud Comp & Distributed Syst qCLOUDS Lab Melbourne Vic 3125 Australia

In recent years, FPGA platforms have shown significant potential for accelerating artificial intelligence (AI) applications, particularly in Embedded AI. While various studies have explored adaptive AI deployment on FPGAs, there remains a gap in methodologies fully integrating software adaptability with FPGA hardware reconfigurability. This article presents a novel end-to-end co-design methodology for deploying adaptable and scalable Convolutional Neural Networks (CNNs) on FPGA platforms. The framework enhances computational performance and reduces latency by dynamically modifying hardware acceleration units by combining CNN architecture adaptability with dynamic partial reconfiguration of FPGA hardware. The proposed methodology enables automated synthesis and runtime customization of both hardware accelerators and CNN architectures, eliminating the need for iterative synthesis. This approach has been implemented and tested on a Xilinx XC7020 FPGA board for a CNN-based image classifier, achieving superior computation performance (0.68s/image) and accuracy (97%) compared to state-of-the-art alternatives.

关键词： Adaptive CNNs FPGA dynamic reconfiguration Hardware acceleration Co-design framework Embedded AI computation performance Scalable AI deployment

来源：评论

学校读者我要写书评

暂无评论

Analysis and Evaluation of the GAS Model for Distributed Graph computation 37

Analysis and Evaluation of the GAS Model for Distributed Gra...

引用

37th IEEE International Conference on Distributed Computing Systems (ICDCS)

作者： Wang, Jinyan Zhang, Chengfei Natl Univ Def Technol Changsha Hunan Peoples R China

ISBN: (纸本)9781538632932

Compared with distributed graph computation, traditionally single node computation is unfitted in processing large scale graph data. The GAS (Gather, Apply and Scatter) Model is a universal vertex-cut graph computation programming model based on edge-centric programs to support graph algorithms, which process distributed graph computation after graph partition. In this paper, we introduce that three minor-steps of GAS. We then analyze more complete process of GAS considering intra-node computation and inter node communication of distributed graph computation. Based on our analysis, we evaluate the performance in different nodes of graph analysis algorithm applying GAS model. The evaluation shows that the bottleneck is computation performance or communication bandwidth depending on number of nodes, which is an inspiration of optimizing the GAS model.

关键词： distributed graph computation vertex-cut GAS model computation performance inter-node communication

来源：评论

学校读者我要写书评

暂无评论

Studying OpenMP thread mapping for parallel linear algebra kernels on multicore system

引用

BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES 2018年第6期66卷 981-990页

作者： Bylina, B. Bylina, J. Marie Curie Sklodowska Univ Inst Math Pl M Curie Sklodowskiej 5 PL-20031 Lublin Poland

Thread mapping is one of the techniques which allow for efficient exploiting of the potential of modern multicore architectures. The aim of this paper is to study the impact of thread mapping on the computing performance, the scalability, and the energy consumption for parallel dense linear algebra kernels on hierarchical shared memory multicore systems. We consider the basic application, namely a matrix-matrix product (GEMM), and two parallel matrix decompositions (LU and WZ). Both factorizations exploit parallel BLAS (basic linear algebra subprograms) operations, among others GEMM. We compare differences between various thread mapping strategies for these applications. Our results show that the choice of thread mapping has the measurable impact on the performance, the scalability, and energy consumption of the GEMM and two matrix factorizations.

关键词： computation performance OpenMP standard nonnegative matrix factorization thread mapping energy consumption

来源：评论

学校读者我要写书评

暂无评论

Improving the performance of spatial raster analysis in GIS using GPU

Improving the performance of spatial raster analysis in GIS ...

引用

15th International Conference on Geoinformatics

作者： Wu, Ye Ge, Ying Yan, Weibao Li, Xinyu Hohai Univ Coll Civil Engn Xikang Rd Nanjing 210098 Jiangsu Peoples R China Univ Technol Coll Civil Engn Nanjing 210093 Jiangsu Peoples R China

ISBN: (纸本)9780819469144

GIS spatial raster analysis has become a powerful tool for geographical phenomena. Unfortunately the computation-intensive raster operations are likely to create computer performance bottlenecks when running on the CPUs. Over the last few years, GPU performance has improved much more than CPU performance. For this reason, many researches have applied the GPUs for scientific, geometric and database computations beyond graphics. This paper demonstrates a general framework for the GPU-based implementation of GIS raster operations, and conducts experiments to compare the computation performance between GPU-based and CPU-based algorithms. The test results indicate that using GPU on spatial raster operations can significantly improve their computation performance. This means that realizing GIS spatial analysis on the GPU create new opportunities by drastically lowering the cost of raster operations on the same hardware performance.

关键词： graphics processing unit (GPU) spatial raster analysis of GIS computation performance

来源：评论

学校读者我要写书评

暂无评论

Fast and accurate RCS evaluation via high-performance parallel FDTD simulation

引用

JOURNAL OF ENGINEERING-JOE 2019年第21期2019卷 7322-7325页

作者： Zhou, Xiao Long Wang, Xin Yu Zhang, Jian Feng You, Jian Wei China Ship Dev & Design Ctr Wuhan 430064 Hubei Peoples R China Southeast Univ Sch Informat & Sci Engn Nanjing 210096 Jiangsu Peoples R China

In this study, a fast and accurate method to predict the radar cross-section (RCS) of large-scale and complicated shape targets is proposed based on a high-performance parallel finite difference time-domain (FDTD) numerical method. To this end, several most popular parallel computation methods [including OpenMP, graphics processing unit (GPU), and message-passing interface (MPI)] are discussed first. Based on this discussion, a novel MPI-OpenMP-GPU hybrid parallel computation scheme for FDTD is developed. Moreover, the corresponding load-balance parallel configuration is discussed as well. Since this hybrid parallel scheme combines the merits of existing parallel technologies, the computation performance is remarkably improved. The results show that the computation time of the RCS simulation of a large-scale target can be reduced from 3 days to 0.8 h, that is, similar to 98.9% time saving.

关键词： application program interfaces radar cross-sections parallel algorithms finite difference time-domain analysis message passing parallel processing radar computing MPI high-performance parallel FDTD simulation parallel computation methods large-scale target RCS simulation computation time computation performance parallel technologies hybrid parallel scheme corresponding load-balance parallel configuration novel MPI-OpenMP-GPU hybrid parallel computation scheme message-passing interface high-performance parallel finite difference time-domain numerical method complicated shape targets radar cross-section time 0 8 hour to 3 0 d

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：