检索结果-内蒙古大学图书馆

A load-balanced acceleration method for small and irregular batch matrix multiplication on GPU

JOURNAL OF SYSTEMS ARCHITECTURE 2025年 160卷

作者： Zhang, Yu Lu, Lu Yang, Zhanyu Liang, Zhihong Suo, Siliang South China Univ Technol Sch Comp Sci & Engn Guangzhou 510006 Peoples R China Peng Cheng Lab Shenzhen 518055 Peoples R China CSG Elect Power Res Inst Guangzhou Peoples R China Guangdong Prov Key Lab Power Syst Network Secur Guangzhou Peoples R China

As an essential mathematical operation, GEneral Matrix Multiplication (GEMM) plays a vital role in many applications, such as high-performance computing, machine learning, etc. In practice, the performance of GEMM is limited by the dimension of matrix and the diversity of GPU hardware architectures. When dealing with batched, irregular and small matrices, the efficiency of GEMM usually performs poorly. To this end, common approach is to segment the matrix into multiple tiles and utilize parallelism between workgroups in GPU to compute the results. However, previous works only consider tile size and inter-workgroup parallelism and ignore the issues of low computational efficiency and hardware resource utilization caused by the difference in workloads between wavefronts. To address these issues, we propose a load-balanced batch GEMM acceleration method, consisting of a multi-thread kernel design and an efficient tiling algorithm. The multithread kernel design can address the workload unbalance between wavefronts indifferent workgroups, and the efficient tiling algorithm can choose the optimal tiling scheme with the new thread-level parallelism calculation method to achieve load-balanced task allocation. Finally, various comparative experiments were conducted on two GPU platforms: AMD and NVIDIA. Experimental results indicate the proposed method outperforms previous methods.

关键词： Batch GEMM Thread workload Multi-thread kernel tiling algorithm

来源：评论

学校读者我要写书评

暂无评论

Simulation of the hexagonal 3D braiding process for stent preforms

引用

TEXTILE RESEARCH JOURNAL 2024年第5-6期94卷 691-703页

作者： Ding, Caihong Gu, Xin Lu, Chenyu Donghua Univ Coll Mech Engn Shanghai Peoples R China Donghua Univ Coll Mech Engn 2999 Renming North Rd Shanghai 201620 Peoples R China

Due to the complex braiding process and long development cycle of the hexagonal three-dimensional braided stent, a MatLab based computer-aided braiding method for a stent is proposed to speed up the development process. First, an oblique coordinate system for the chassis and a polar coordinate system for the chassis unit are constructed, respectively, precisely to coordinate the carrier's movements on the chassis. Subsequently, an iterative formula delineating the trajectory of the carrier is introduced. The formula effectively translates the entire braiding process into the positional coordinates of the carrier on the chassis and the yarn heights on the mandrel during different stages. Based on the specific characteristics of the braiding process and the stent's structure, the stent is divided into pressing and twisting sections. The interwoven pattern for both the pressing and twisting sections is determined by establishing the basic tiling form, solving the yarn interwoven sequence, judging the interwoven type and computing the number of kinks. Finally, while considering the stent's dimensional parameters and the interwoven pattern of the yarn, the spatial curve equations for the yarns in both the pressing and the twisting sections are formulated. By concatenating these equations for each section, the three-dimensional trajectory equations and a comprehensive solid model of the stent are successfully derived. Through a rigorous comparative analysis of dimensions and the braiding pattern between the three-dimensional solid model and the physical stent preform, the accuracy and fidelity of the model generated through the implementation of computer-aided braiding technology are verified.

关键词： Hexagonal 3D braiding vascular stent computer-aided braiding (CAB) tiling algorithm interweaving judgment

来源：评论

学校读者我要写书评

暂无评论

A new neural network (tiling-contextual neural network for structures, TC-NNfS) enabling the treatment of relatively small datasets of therapeutic interest: An application to a small dataset of ACE inhibitors

引用

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 2014年 137卷 1-9页

作者： Carli, Niccolo Massarelli, Ilaria Bianucci, Anna Maria Dipartimento Farm I-56126 Pisa Italy Consorzio Interuniv Nazl Sci & Tecnol Mat I-50121 Florence Italy

This paper describes a new neural network for structures particularly useful for Quantitative Structure-Activity Relationship (QSAR) and Quantitative Structure-Property Relationship (QSPR) applications. Such an algorithm was conceived in order to improve the performance of ANN-based software, when relatively small datasets have to be processed. Encouraging results were achieved in the analysis of a relatively small class of inhibitors of the angiotensin converting enzyme (ACE), taken as probe for our purposes. A huge amount of data is available for ACE inhibition, but only 45 molecules were found to be of interest in view of the design of triple ligands capable if simultaneously inhibiting ACE, neutral endo-peptidase (NEP), and endothelin converting enzyme (ECE), which may be of interest for therapeutic applications. The implementation of this algorithm was proved to supply a valuable solution to one of the major problems encountered in the applications of QSAR/QSPR method to the task of molecular design, in particular drug design. Indeed datasets of known structures and relevant biological properties of interest in drug design do often contain a number of elements below a hundred or a very few hundreds. ANN based approaches are commonly proved to work well, instead, when trained on datasets comprising a number of elements at least an order of magnitude higher. For comparison purposes with the approach described here, other commonly used QSAR models were developed. They were obtained using different algorithms available within the WEKA package, some of which are based on neural networks. The comparison clearly shows a better performance of the model obtained with neural networks for structures in general and with the algorithm proposed here in particular. (C) 2014 Published by Elsevier B.V.

关键词： Neural network for structures Recursive cascade correlation Contextual recursive cascade correlation tiling algorithm Molecular graph Constructive neural networks Angiotensin converting enzyme (ACE) inhibitors

来源：评论

学校读者我要写书评

暂无评论

Architectural investigation of matrix data layout on multicore processors

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2014年 37卷 64-75页

作者： Kim, Minwoo Ro, Won Woo Yonsei Univ Sch Elect & Elect Engn Seoul South Korea

Many practical applications include matrix operations as essential procedures. In addition, recent studies of matrix operations rely on parallel processing to reduce any calculation delays. Because these operations are highly data intensive, many studies have investigated work distribution techniques and data access latency to accelerate algorithms. However, previous studies have not considered hardware architectural features adequately, although they greatly affect the performance of matrix operations. Thus, the present study considers the architectural characteristics that affect the performance of matrix operations on real multicore processors. We use matrix multiplication, LU decomposition, and Cholesky factorization as the test applications, which are well-known data-intensive mathematical algorithms in various fields. We argue that applications only access matrices in a particular direction, and we propose that the canonical data layout is the optimal matrix data layout compared with the block data layout. In addition, the tiling algorithm is utilized to increase the temporal data locality in multilevel caches and to balance the workload as evenly as possible in multicore environments. Our experimental results show that applications using the canonical data layout with tiling have an 8.23% faster execution time and 3.91% of last level cache miss rate compared with applications executed with the block data layout. (C) 2013 Elsevier B.V. All rights reserved.

关键词： Matrix data layout Multicore architecture Multilevel cache structure Parallel algorithm tiling algorithm

来源：评论

学校读者我要写书评

暂无评论

Benefits of using parallelized non-progressive network coding

引用

JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 2013年第1期36卷 293-305页

作者： Kim, Minwoo Park, Karam Ro, Won W. Yonsei Univ Sch Elect & Elect Engn Seoul 120749 South Korea Samsung Elect Platform R&D Team Mobile Commun Suwon South Korea

Network coding helps improve communication rate and save bandwidth by performing a special coding at the sending or intermediate nodes. However, encoding/decoding at the nodes creates computation overhead on large input data that causes coding delays. Therefore the progressive method which can hide decoding delay in waiting time is proposed in the previous works. However, the network speed has been greatly accelerated and progressive schemes are no longer the most efficient decoding method. Thus, we present non-progressive decoding algorithm that can be more aggressively parallelized than the progressive network coding, which can diminish the advantages of hidden decoding time of progressive methods by utilizing the multi-core processors. Moreover, the block algorithm implemented by non-progressive decoding helps to reduce cache misses. Through experiments, our scheme which relies on matrix inversion and multiplication shows 46.0% improved execution time and 89.2% last level cache miss reduction compared to the progressive method on multi-core systems. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： Network coding Parallel algorithm Non-progressive decoder tiling algorithm Matrix inversion Matrix multiplication

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：