检索结果-内蒙古大学图书馆

Tokenization and memory optimization for Reducing gpu Load in NLP Deep Learning Models

TEHNICKI VJESNIK-TECHNICAL GAZETTE 2024年第6期31卷 1995-2002页

作者： Dodic, Dejan Regodic, Dusan Acad Appl Tech & Presch Studies Dept Informat Commun Technol Beogradska 18 Nish Serbia MB Univ Fac Business & Law Dept Adv Informat Technol Teodora Drajzera 27 Belgrade Serbia

In the current landscape of advanced natural language processing (NLP), managing gpu memory effectively is crucial. This paper delves into new tokenization methods and data handling to enhance NLP model efficiency, focusing on avoiding "CUDA out of memory" errors. It examines how sophisticated tokenization and managing text lengths in large datasets can boost model performance. These insights are vital for optimizing resources and scaling NLP models, especially with limited gpu memory. The paper also contextualizes NLP challenges, underlining the significance of memory optimization amidst growing language model complexities. It reviews key NLP technologies, including transformer models, and addresses their memory optimization challenges. Moreover, it underscores the paper's role in developing innovative techniques for more effective memory optimization, linking it to ongoing research and trends in NLP. This work aims to progress natural language processing methods and make AI technologies more accessible.

关键词： data tokenization deep learning cuda out of memory gpu memory optimization machine learning natural language processing (nlp)

来源：评论

学校读者我要写书评

暂无评论

Algorithmic GPgpu memory optimization

引用

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE 2014年第4期14卷 391-406页

作者： Jang, Byunghyun Choi, Minsu Kim, Kyung Ki Univ Mississippi Dept Comp & Informat Sci University MS 38677 USA Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO USA Daegu Univ Dept Elect & Comp Engn Gyongsan South Korea

The performance of General-Purpose computation on Graphics Processing Units (GPgpu) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on gpus and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-patternaware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPgpu workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

关键词： gpu memory optimization memory access pattern thread mapping GPgpu gpu computing

来源：评论

学校读者我要写书评

暂无评论

A Swap Dominated Tensor Re-Generation Strategy for Training Deep Learning Models 36

A Swap Dominated Tensor Re-Generation Strategy for Training ...

引用

36th IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS)

作者： Wen, Lijie Zong, Zan Lin, Li Lin, Leilei Tsinghua Univ Sch Software Beijing Peoples R China Capital Normal Univ Sch Management Beijing Peoples R China

ISBN: (纸本)9781665481069

With the growing of the depth of neural networks and the scale of data, the difficulty of network training also increases. When the gpu memory is insufficient, it is challenging to train deeper models. Recent research uses tensor swapping and recomputation techniques in a combined manner to optimize the memory usage. However, complex dependencies of the DNN graph limit the improvement of the single gpu memory optimization. Improper swap decisions even brings negative effects because the source of the recomputation may have been swapped out. In this paper, we propose a novel swap dominated tensor re-generation strategy, called STR, which combines swap and recomputation techniques to find the optimal execution plan for the DNN training when the memory is limited. We formalize our memory optimization problem with constraints which describe the dependency of the operator calculation and the bandwidth usage of swap. A host checkpoint mechanism is designed to make full use of the swapped tensors, which reduces the cost of the recomputation. We also present an approximation method based on a recursive source tracing procedure to improve the optimization efficiency. We implement a prototype of STR as a plugin on TensorFlow. The experimental result shows that STR improves up to 21.3% throughput compared with the state-of-the-art hybrid optimization strategy.

关键词： recomputation swap DNN training gpu memory optimization

来源：评论

学校读者我要写书评

暂无评论

Empowering CG ProductionCost-Effective Techniques for Voluminous Fur Rendering with Unreal Engine 24

Empowering CG ProductionCost-Effective Techniques for Volumi...

引用

2024 SIGGRAPH Asia Conference-SIGGRAPH Asia

作者： Xia, Ning Yin, Xiaofei Feng, Xuecong Childrens Playground Entertainment Inc Tokyo Japan

ISBN: (纸本)9798400711381

This paper presents a novel approach to optimizing realistic fur rendering for CG animation using Unreal Engine (UE). We introduce a progressive method combining three key techniques to enhance rendering efficiency while maintaining high quality. These strategies significantly optimize gpu memory use, enabling more complex and realistic results. Our approach streamlines production workflows, reducing both time and costs, offering broader potential applications in the film industry.

关键词： fur rendering gpu memory optimization production pipeline

来源：评论

学校读者我要写书评

暂无评论

Accurate and efficient urban wind prediction at city-scale with memory-scalable graph neural network

引用

SUSTAINABLE CITIES AND SOCIETY 2023年 99卷

作者： Liu, Zhijian Zhang, Siqi Shao, Xuqiang Wu, Zhaohui North China Elect Power Univ Dept Power Engn Baoding 071003 Hebei Peoples R China North China Elect Power Univ Dept Comp Sci Baoding 071003 Hebei Peoples R China Minist Educ Engn Res Ctr Intelligent Comp Complex Energy Syst Baoding 071003 Hebei Peoples R China China Acad Transportat Sci Beijing 100029 Peoples R China

The interaction between buildings and wind significantly impacts the comfort and safety of pedestrians, thereby influencing the sustainability of cities. Computational fluid dynamics (CFD) simulation of wind velocity in urban environments provides valuable insights into building aerodynamics. Traditional CFD solvers are limited by high computational costs, hindering practical engineering applications. Graph neural networks (GNNs) have emerged as a promising approach to accelerate CFD simulations on unstructured meshes. However, their inability to handle large-scale urban wind prediction due to high gpu memory requirements poses a challenge, as GNNs rely on gpus for fast training and inference. To overcome this limitation, we propose SGMS-GNN, a novel GNN model that accurately and efficiently predicts wind velocity fields in urban environments while maintaining consistent gpu memory usage as the simulation domain increases. We employed a validated CFD model to generate a dataset of wind velocity fields in various urban topologies by simulating wind flow through randomly generated building layouts. Our well-generalized SGMS-GNN demonstrates accurate urban wind field predictions at cityscale, achieving a 70 % reduction in gpu memory usage compared to other GNN models. Furthermore, the proposed model outperforms the CFD model on which it is trained by running 1-2 orders of magnitude faster.

关键词： Urban wind field Subgraph partitioning gpu memory optimization Graph neural network Deep learning CFD

来源：评论

学校读者我要写书评

暂无评论

Optimizing memory Access Efficiency in CUDA Kernel via Data Layout Technique

引用

Journal of Computer and Communications 2024年第5期12卷 124-139页

作者： Neda Seifi Abdullah Al-Mamun Department of Computer & Cyber Sciences&#8212 SCCS Augusta University Augusta Georgia USA

Over the past decade, Graphics Processing Units (gpus) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming gpus remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in gpu-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in gpu computing.

关键词： Data Layout optimization CUDA Performance optimization gpu memory optimization Dynamic Programming Matrix Multiplication memory Access Pattern optimization in CUDA

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：