检索结果-内蒙古大学图书馆

Empirical Modeling of Spatially Diverging Performance 7

Empirical Modeling of Spatially Diverging Performance

IEEE/ACM International Workshop on HPC User Support Tools (HUST) / Workshop on programming and Performance Visualization Tools (ProTools)

作者： Calotoiu, Alexandru Geisenhofer, Markus Kummer, Florian Ritter, Marcus Weber, Jens Hoefler, Torsten Oberlack, Martin Wolf, Felix Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland Tech Univ Darmstadt Dept Mech Engn Darmstadt Germany

ISBN: (纸本)9780738110707

A common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools such as Extra-P exploit this common pattern to make their modeling process more noise resilient, mitigating the effect of outliers by summarizing performance measurements of individual functions across all processes. While the underlying assumption does not equally hold for all applications, knowing the qualitative differences in how the performance of individual processes changes as execution parameters are varied can reveal important performance bottlenecks such as malicious patterns of load imbalance. A challenge for empirical modeling tools, however, arises from the fact that the behavioral class of a process may depend on the process configuration, letting process ranks migrate between classes as the number of processes grows. In this paper, we introduce a novel approach to the problem of modeling of spatially diverging performance based on a certain type of process clustering. We apply our technique to identify a previously unknown performance bottleneck in the BoSSS fluid-dynamics code. Removing it made the code regions in question running up to 20 times and the application as a whole run up to 4.5 times faster.

关键词： parallel programming performance modeling fluid dynamics

来源：评论

学校读者我要写书评

暂无评论

Introduction to programming with C++ for Engineers 1

引用

丛书名： Wiley - IEEE

2020年

作者： Boguslaw Cyganek

ISBN: (数字)9781119431152

ISBN: (纸本)9781119431107

A complete textbook and reference for engineers to learn the fundamentals of computer programming with modern C++ Introduction to programming with C++ for Engineers is an original presentation teaching the fundamentals of computer programming and modern C++ to engineers and engineering students. Professor Cyganek, a highly regarded expert in his field, walks users through basics of data structures and algorithms with the help of a core subset of C++ and the Standard Library, progressing to the object-oriented domain and advanced C++ features, computer arithmetic, memory management and essentials of parallel programming, showing with real world examples how to complete tasks. He also guides users through the software development process, good programming practices, not shunning from explaining low-level features and the programming tools. Being a textbook, with the summarizing tables and diagrams the book becomes a highly useful reference for C++ programmers at all levels. Introduction to programming with C++ for Engineers teaches how to program by: Guiding users from simple techniques with modern C++ and the Standard Library, to more advanced object-oriented design methods and language features Providing meaningful examples that facilitate understanding of the programming techniques and the C++ language constructions Fostering good programming practices which create better professional programmers Minimizing text descriptions, opting instead for comprehensive figures, tables, diagrams, and other explanatory material Granting access to a complementary website that contains example code and useful links to resources that further improve the reader’s coding ability Including test and exam question for the reader’s review at the end of each chapter Engineering students, students of other sciences who rely on computer programming, and professionals in various fields will find this book invaluable when learning to program with C++.

关键词： Introduction to programming in C++ for Engineers C++ for engineers intro programming programming for beginners introduction to C++ introduction to C C++ for beginners c for beginners C++ instruction manual C++ guide C for engineers object-oriented programming design patterns parallel programming computer arithmetic

来源：评论

学校读者我要写书评

暂无评论

Managing Failures in Task-Based parallel Workflows in Distributed Computing Environments 26th

Managing Failures in Task-Based Parallel Workflows in Distri...

引用

26th International Conference on parallel and Distributed Computing (Euro-Par)

作者： Ejarque, Jorge Bertran, Marta Cid-Fuentes, Javier Alvarez Conejero, Javier Badia, Rosa M. Barcelona Supercomp Ctr Barcelona Spain

ISBN: (纸本)9783030576752;9783030576745

Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, etc.) However, some of these failures can be caused by the application itself (corrupted data, algorithms which are not converging for certain conditions, etc.), and these fault tolerance mechanisms are not sufficient to perform a successful workflow execution. In these cases, developers have to add some code in their applications to prevent and manage the possible failures. In this paper, we propose a simple interface and a set of transparent runtime mechanisms to simplify how scientists deal with application-based failures in task-based parallel workflows. We have validated our proposal with use-cases from e-science and machine learning to show the benefits of the proposed interface and mechanisms in terms of programming productivity and performance.

关键词： Failure management Scientific workflows parallel programming Distributed computing

来源：评论

学校读者我要写书评

暂无评论

Adaptive MPI collective operations based on evaluations in LogP model

引用

Procedia Computer Science 2021年 186卷 323-330页

作者： A.A. Paznikov M.S. Kupriyanov Saint Petersburg Electrotechnical University “LETI” ul. Professora Popova 5 St. Petersburg 197376 Russia

Message passing model, represented by MPI (Message Passing Interface), is the principal parallel programming tool for distributed computer systems. The most of MPI-programs contain collective communications, which involve all the processes of a parallel program. Effectiveness of collective communications substantially effects on total time of program execution. In this work, we consider the problem of design of adaptive algorithms of collective communications on the example of barrier synchronization, which refers to one of the most common types of collective communications. We developed adaptive algorithm of barrier synchronization, which suboptimally selects barrier synchronization scheme in parallel MPI-programs among such algorithms as Central Counter, Combining Tree and Dissemination Barrier. The adaptive algorithm chooses the barrier algorithm with the minimal evaluation of execution time in the model LogP. Model LogP considers performance of computational resources and interconnect for point-to-point communications. Proposed algorithm has been implemented for MPI. We present the results of experiments on cluster systems, analyse dependency of algorithm selection on LogP parameters values. In particular, for the number of processes less than 20 adaptive algorithm selects Combining Tree, while for a larger number of processes adaptive algorithm selects Dissemination Barrier. Developed algorithm minimizes average time of barrier synchronization by 4%, in comparison with the most common determined barrier algorithms.

关键词： Collectives collective communications barrier barrier synchronization distributed computer systems LogP MPI parallel programming

来源：评论

学校读者我要写书评

暂无评论

Accelerating Real-Time Applications with Predictable Work-Stealing 33rd

Accelerating Real-Time Applications with Predictable Work-St...

引用

33rd International Conference on Architecture of Computing Systems (ARCS)

作者： Fritz, Florian Schmid, Michael Mottok, Juergen Regensburg Univ Appl Sci Lab Safe & Secure Syst LaS3 Regensburg Germany

ISBN: (纸本)9783030527938;9783030527945

Modern compute architectures often consist of multiple CPU cores to achieve their performance, as physical properties put a limit on the execution speed of a single processor. This trend is also visible in the embedded and real-time domain, where programmers are forced to parallelize their software to keep deadlines. Additionally, embedded systems rely increasingly on modular applications, that can easily be adapted to different system loads and hardware configurations. To parallelize applications under these dynamic conditions, often dispatching frameworks like Threading Building Blocks (TBB) are used in the desktop and server segment. More recently, Embedded Multicore Building Blocks (EMB2) was developed as a task-based programming solution designed with the constraints of embedded systems in mind. In this paper, we discuss how task-based programming fits such systems by analyzing scheduler implementation variants, with a focus on classic work-stealing and the libraries TBB and EMB2. Based on the state of the art we introduce a novel resource-trading concept that allows static memory allocation in a work-stealing runtime holding strict space and time bounds. We conduct benchmarks between an early prototype of the concept, TBB and EMB2, showing that resource-trading does not introduce additional runtime overheads, while unfortunately also not improving on execution time variances.

关键词： Real-time parallel programming Work-stealing

来源：评论

学校读者我要写书评

暂无评论

Applying parallel and Distributed Computing Curriculum to Cyber Security Courses

Applying Parallel and Distributed Computing Curriculum to Cy...

引用

Workshop on Education for High Performance Computing (EduHPC)

作者： Velea, Radu Ilie, Valentin Bica, Ion Mil Tech Acad Fac Informat Syst & Cyber Secur Bucharest Romania

ISBN: (纸本)9780738143057

parallel technologies evolve at a fast rate, with new hardware and programming frameworks being introduced every few years. Keeping a parallel and Distributed Computing (PDC) lecture up to date is a challenge in itself, let alone when one has to consider the synergies between other courses and the shifts in direction that are industry-driven and echo inside the student body. This paper details the process of aligning parallel and distributed curriculum at the Military Technical Academy of Bucharest (MTA) over the last five years, with government and industry demands as well as faculty and student expectations. The result has been an adaptation and an update of the previous lectures and assignments on PDC, and the creation of a new course that relies heavily on parallel technologies to provide a modern outlook on software security and the tools used to combat cyber threats. Concepts and assignments originally designed for a PDC course have molded perfectly into a new supporting paradigm focused on malicious code (malware) analysis.

关键词： parallel programming Malware Analysis GPU Computing Undergraduate Education

来源：评论

学校读者我要写书评

暂无评论

Enhancement of an Encryption System Performance using MPI 13

Enhancement of an Encryption System Performance using MPI

引用

13th International Conference on Communications (COMM)

作者： Amar, Islam Abutaha, Mohammed Palestine Polytech Univ PPU Coll Informat Technol & Comp Engn Hebron Palestine

ISBN: (数字)9781728156118

ISBN: (纸本)9781728156118

Nowadays with a tremendous speed and continuous development, the world's connection to the Internet has increased to a point that it has become part of their unsteady *** as technology evolves, encryption has become a priority for our lives to protect sensitive data from hacking and piracy. However, this process takes a long time to transfer, handle and program data by the appropriate electronic means. In this paper we present a new methodology which depends on distributed memory architecture based on message passing Interface(MPI) to enhance the performance of the sequential algorithm. Our results showed that the new model give 2x speed up compared to the previous one.

关键词： Message Passing Interface Cryptography Image Encryption and Decryption Security parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Statistical Analysis of Error in MPI Reduction Operations 4

A Statistical Analysis of Error in MPI Reduction Operations

引用

4th IEEE/ACM International Workshop on Software Correctness for HPC Applications (Correctness)

作者： Pollard, Samuel D. Norris, Boyana Univ Oregon Comp & Informat Sci Eugene OR 97403 USA

ISBN: (纸本)9780738110448

This work explores the effects of nonassociativity of floating-point addition on Message Passing Interface (MPI) reduction operations. Previous work indicates floating-point summation error is comprised of two independent factors: error based on the summation algorithm and error based on the summands themselves. We find evidence to suggest, for MPI reductions, the error based on summands has a much greater effect than the error based on the summation algorithm. We begin by sampling from the state space of all possible summation orders for MPI reduction algorithms. Next, we show the effect of different random number distributions on summation error, taking a 1000-digit precision floating-point accumulator as ground truth. Our results show empirical error bounds that are much tighter than existing analytical bounds. Last, we simulate different allreduce algorithms on the high performance computing (HPC) proxy application Nekbone and find that the error is relatively stable across algorithms. Our approach provides HPC application developers with more realistic error bounds of MPI reduction operations. Quantifying the small-but nonzero-discrepancies between reduction algorithms can help developers ensure correctness and aid reproducibility across MPI implementations and cluster topologies.

关键词： Floating-point arithmetic message passing interface parallel programming reduction tree roundoff error summation order

来源：评论

学校读者我要写书评

暂无评论

Scalable parallelization of Stencils Using MODA 1

引用

34th International Conference on High Performance Computing (ISC High Performance)

作者： Jumah, Nabeeh Kunkel, Julian Univ Hamburg Hamburg Germany Univ Reading Reading Berks England

ISBN: (数字)9783030343569

ISBN: (纸本)9783030343569;9783030343552

The natural and the design limitations of the evolution of processors, e.g., frequency scaling and memory bandwidth bottle-necks, push towards scaling applications on multiple-node configurations besides to exploiting the power of each single node. This introduced new challenges to porting applications to the new infrastructure, especially with the heterogeneous environments. Domain decomposition and handling the resulting necessary communication is not a trivial task. parallelizing code automatically cannot be decided by tools in general as a result of the semantics of the general-purpose languages. To allow scientists to avoid such problems, we introduce the Memory-Oblivious Data Access (MODA) technique, and use it to scale code to configurations ranging from a single node to multiple nodes, supporting different architectures, without requiring changes in the source code of the application. We present a technique to automatically identify necessary communication based on higher-level semantics. The extracted information enables tools to generate code that handles the communication. A prototype is developed to implement the techniques and used to evaluate the approach. The results show the effectiveness of using the techniques to scale code on multi-core processors and on GPU based machines. Comparing the ratios of the achieved GFLOPS to the number of nodes in each run, and repeating that on different numbers of nodes shows that the achieved scaling efficiency is around 100%. This was repeated with up to 100 nodes. An exception to this is the singlenode configuration using a GPU, in which no communication is needed, and hence, no data movement between GPU and host memory is needed, which yields higher GFLOPS.

关键词： HPC Scalability parallel programming Stencils

来源：评论

学校读者我要写书评

暂无评论

A Toolchain to Verify the parallelization of OmpSs-2 Applications 26th

A Toolchain to Verify the Parallelization of OmpSs-2 Applica...

引用

26th International Conference on parallel and Distributed Computing (Euro-Par)

作者： Economo, Simone Royuela, Sara Ayguade, Eduard Beltran, Vicenc Barcelona Supercomp Ctr BSC Barcelona Spain Sapienza Univ Roma DIAG Antonio Ruberti Rome Italy

ISBN: (纸本)9783030576752;9783030576745

programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.

关键词： Synchronization Software testing and debugging parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：