检索结果-内蒙古大学图书馆

parallelization and performance optimization of a dynamic PDE fixed bed reactor model for practical applications

COMPUTERS & CHEMICAL ENGINEERING 2004年第9期28卷 1585-1597页

作者： Lindborg, H Eide, V Unger, S Henriksen, ST Jakobsen, HA Norwegian Univ Sci & Technol Dept Chem Engn NO-7491 Trondheim Norway Norwegian Univ Sci & Technol NTNU High Performance Comp Grp IT Div NO-7491 Trondheim Norway Fraunhofer FIRST Fraunhofer Inst Comp Architect & Software Technol Berlin Germany Norwegian Univ Sci & Technol Dept Math Sci NO-7491 Trondheim Norway

An important inherent limitation of dynamic multiphase reactor flow simulations is the computational time requirements, making long time statistics intractable. A parallel CFD model has therefore been developed intended for the simulation of multi-phase reactors. The present version of the model simulates 2D reactive flows in a fixed bed reactor. The simulations are performed on two grids of different resolutions. The predicted profiles are in accordance with results reported in the literature. parallelization and performance optimization of the model have been performed to reduce the computational time. Further reductions have been achieved by applying compiler optimization. The most expensive part of the numerical solution algorithm is the implicit solution of the Poisson equation for the pressure. To solve the Poisson equation a TDMA-algorithm with and without a global block correction procedure, several variations of the conjugated gradients-algorithm and a bi-orthogonal conjugate gradient-algorithm were tested. The optimization work performed has shown that, compared to the serial non-optimized version of the code, the computational time spend solving the model has been reduced by more than an order of magnitude by using an optimized algorithm combined with optimal compiler options. Further reductions in computational time has been achieved by parallelizing the program. With this type of model performance optimization, the multiphase reactive flow systems in chemical reactors are expected to be simulated within feasible time limits. (C) 2004 Elsevier Ltd. All rights reserved.

关键词： viscous flow CFD model performance optimization model parallelization

来源：评论

学校读者我要写书评

暂无评论

A Hybrid parallelization Approach for Distributed and Scalable Deep Learning

引用

IEEE ACCESS 2022年 10卷 77950-77961页

作者： Akintoye, Samson B. Han, Liangxiu Zhang, Xin Chen, Haoming Zhang, Daoqiang Manchester Metropolitan Univ Dept Comp & Math Manchester M15 6BH Lancs England Univ Sheffield Dept Comp Sci Sheffield S10 2TN S Yorkshire England Nanjing Univ Aeronaut & Astronaut Coll Comp Sci & Technol Nanjing 210016 Peoples R China

Recently, Deep Neural Networks (DNNs) have recorded significant success in handling medical and other complex classification tasks. However, as the sizes of DNN models and the available datasets increase, the training process becomes more complex and computationally intensive, usually taking longer to complete. In this work, we have proposed a generic full end-to-end hybrid parallelization approach combining model and data parallelism for efficiently distributed and scalable training of DNN models. We have also proposed a Genetic Algorithm Based Heuristic Resources Allocation (GABRA) mechanism for optimal distribution of partitions on the available GPUs for computing performance optimization. We have applied our proposed approach to a real use case based on 3D Residual Attention Deep Neural Network (3D-ResAttNet) for efficient Alzheimer Disease (AD) diagnosis on multiple GPUs and compared with the existing state-of-the-art parallel methods. The experimental evaluation shows that our proposed approach is 20% averagely better than existing parallel methods in terms of training time and achieves almost linear speedup with little or no differences in accuracy performance when compared with the existing non-parallel DNN models.

关键词： Computational modeling Data models Training Resource management Genetic algorithms Computer architecture Neural networks Deep learning genetic algorithm data parallelization model parallelization

来源：评论

学校读者我要写书评

暂无评论

Accelerate model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device Placement 22nd

Accelerate Model Parallel Deep Learning Training Using Effec...

引用

22nd IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS) Held as Part of the 17th International Federated Conference on Distributed Computing Techniques (DisCoTec)

作者： Wang, Tianze Payberah, Amir H. Hagos, Desta Haileselassie Vlassov, Vladimir KTH Royal Inst Technol Stockholm Sweden

ISBN: (纸本)9783031160929;9783031160912

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decisionmaking by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of neural network graph traversal orders on device placement. In particular, we empirically study how different graph traversal orders of neural networks lead to different device placements, which in turn affects the training time of the neural network. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing effective graph traversal orders in device placement for various neural network families to improve the training time in model parallelization.

关键词： Device Placement model parallelization Deep Learning Graph Traversal Order

来源：评论

学校读者我要写书评

暂无评论

Improving the performance of seismic wave simulations with dynamic load balancing

Improving the performance of seismic wave simulations with d...

引用

22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

作者： Tesser, Rafael Keller Pilla, Laercio Lima Dupros, Fabrice Navaux, Philippe O. A. Mehaut, Jean-Francois Mendes, Celso Univ Fed Rio Grande do Sul Inst Informat BR-90046900 Porto Alegre RS Brazil Bur Rech Geol & Minieres F-45060 Orleans 2 France LIG Grenoble France Univ Illinois Natl Ctr Supercomp Applicat Urbana IL 61801 USA

ISBN: (纸本)9781479927289

Seismic wave models provide a way to study the consequences of future earthquakes. When modeling a restricted region, these models require a boundary condition to absorb the energy that goes out of the simulated domain. To parallelize these models, the domain is decomposed into a grid of smaller subdomains which are mapped to different tasks. Due to the boundary condition, this division gives rise to load imbalance between the tasks that simulate border regions and those assigned center subdomains. To deal with this imbalance, and therefore improve the simulation's performance, we propose the use of dynamic load balancing. To evaluate our solution, we ported a seismic wave simulator to Adaptive MPI to profit from its load balancing framework. By using dynamic load balancers, we improved the performance of the application by 23.85% when compared to the original MPI implementation. We also show that load balancers are able to adapt to the variation of load imbalance during the application's execution.

关键词： application program interfaces digital simulation earthquake engineering geophysics computing message passing parallel processing resource allocation seismic waves adaptive MPI boundary condition dynamic load balancers dynamic load balancing earthquakes model parallelization seismic wave simulation performance seismic wave simulator Boundary conditions Equations Load management Load modeling Mathematical model Runtime Seismic waves application dynamic load balancing high-performance load balancing overdecomposition parallel computing performance seismic wave simulation

来源：评论

学校读者我要写书评

暂无评论

MemFlow: Memory-Aware Distributed Deep Learning 20

MemFlow: Memory-Aware Distributed Deep Learning

引用

ACM SIGMOD International Conference on Management of Data (SIGMOD)

作者： Band, Neil Harvard Univ Cambridge MA 02138 USA

ISBN: (纸本)9781450367356

As the number of layers and the amount of training data increases, the trend is to train deep neural networks in parallel across devices. In such scenarios, neural network training is increasingly bottlenecked by high memory requirements posed by intermediate results, or feature maps, that are produced during the forward pass and consumed during the backward pass. We recognize that the best-performing device parallelization configurations should consider memory usage in addition to the canonical metric of computation time. Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy. MemFlow consists of: (i) a task graph with memory usage estimates; (ii) a memory-aware execution simulator; and (iii) a Markov Chain Monte Carlo search algorithm that considers various degrees of recomputation i.e., discarding feature maps during the forward pass and recomputing them during the backward pass. Our experiments demonstrate that under memory constraints, MemFlow can readily locate valid and superior parallelization strategies unattainable with previous frameworks.

关键词： neural network training distributed machine learning deep neural networks recomputation model parallelization memory optimization

来源：评论

学校读者我要写书评

暂无评论

Distributed Training and Inference of Deep Learning models for Multi-Modal Land Cover Classification

引用

REMOTE SENSING 2020年第17期12卷 2670-2670页

作者： Aspri, Maria Tsagkatakis, Grigorios Tsakalides, Panagiotis Fdn Res & Technol Hellas FORTH Inst Comp Sci GR-70013 Iraklion Greece Univ Crete Comp Sci Dept GR-70013 Iraklion Greece

Deep Neural Networks (DNNs) have established themselves as a fundamental tool in numerous computational modeling applications, overcoming the challenge of defining use-case-specific feature extraction processing by incorporating this stage into unified end-to-end trainable models. Despite their capabilities in modeling, training large-scale DNN models is a very computation-intensive task that most single machines are often incapable of accomplishing. To address this issue, different parallelization schemes were proposed. Nevertheless, network overheads as well as optimal resource allocation pose as major challenges, since network communication is generally slower than intra-machine communication while some layers are more computationally expensive than others. In this work, we consider a novel multimodal DNN based on the Convolutional Neural Network architecture and explore several different ways to optimize its performance when training is executed on an Apache Spark Cluster. We evaluate the performance of different architectures via the metrics of network traffic and processing power, considering the case of land cover classification from remote sensing observations. Furthermore, we compare our architectures with an identical DNN architecture modeled after a data parallelization approach by using the metrics of classification accuracy and inference execution time. The experiments show that the way a model is parallelized has tremendous effect on resource allocation and hyperparameter tuning can reduce network overheads. Experimental results also demonstrate that proposed model parallelization schemes achieve more efficient resource use and more accurate predictions compared to data parallelization approaches.

关键词： distributed deep learning model parallelization convolutional neural networks multi-modal observation classification land cover classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：