检索结果-内蒙古大学图书馆

Reverse-Mode AD of Reduce-by-Index and Scan in Futhark

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Bruun, Lotte Maria Larsen, Ulrik Stuhr Hinnerskov, Nikolaj Oancea, Cosmin University of Copenhagen Denmark

We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce by index. We first present derivations of general-case algorithms, and then discuss several specializations that result in efficient differentiation of most cases of practical interest. We report an experiment that evaluates the performance of the differentiated code in the context of GPU execution, and highlights the impact of the proposed specializations as well as the strengths and weaknesses of differentiating at high level vs. low level (i.e., "differentiating the memory"). Copyright © 2023, The Authors. All rights reserved.

关键词： parallel programming

Quantifying OpenMP: Statistical Insights into Usage and Adoption

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Kadosh, Tal Hasabnis, Niranjan Mattson, Timothy Pinter, Yuval Oren, Gal Department of Computer Science Ben-Gurion University Israel Israel Atomic Energy Commission Intel Labs United States Scientific Computing Center Nuclear Research Center Negev Israel Department of Computer Science Technion - Israel Institute of Technology Israel

In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice due to its simplicity and portability, offering a directive-driven approach for shared-memory parallel programming. Despite its wide adoption, however, there is a lack of comprehensive data on the actual usage of OpenMP constructs, hindering unbiased insights into its popularity and evolution. This paper presents a statistical analysis of OpenMP usage and adoption trends based on a novel and extensive database, HPCORPUS, compiled from GitHub repositories containing C, C++, and Fortran code. The results reveal that OpenMP is the dominant parallel programming model, accounting for 45% of all analyzed parallel APIs. Furthermore, it has demonstrated steady and continuous growth in popularity over the past decade. Analyzing specific OpenMP constructs, the study provides in-depth insights into their usage patterns and preferences across the three languages. Notably, we found that while OpenMP has a strong "common core" of constructs in common usage (while the rest of the API is less used), there are new adoption trends as well, such as simd and target directives for accelerated computing and task for irregular parallelism. Overall, this study sheds light on OpenMP's significance in HPC applications and provides valuable data for researchers and practitioners. It showcases OpenMP's versatility, evolving adoption, and relevance in contemporary parallel programming, underlining its continued role in HPC applications and beyond. These statistical insights are essential for making informed decisions about parallelization strategies and provide a foundation for further advancements in parallel programming models and techniques. HPCORPUS, as well as the analysis scripts and raw results, are available at: https://***/Scientific-Computing-Lab-NRCN/HP

关键词： parallel programming

Fine-grained parallelism framework with predictable work-stealing for real-time multiprocessor systems

学校读者我要写书评

暂无评论

JOURNAL OF SYSTEMS ARCHITECTURE 2022年第0期124卷 102393-102393页

作者： Schmid, Michael Fritz, Florian Mottok, Juergen Regensburg Univ Appl Sci Lab Safe & Secure Syst Regensburg Germany

Lately, parallel task models have received much attention in the development of real-time multiprocessor systems, as they allow highly compute-intensive tasks to have shorter deadlines which is very much required in modern reactive systems. However, missing modularity and portability can make parallel programming a cumbersome endeavor. As a consequence, compute-intensive sectors in the desktop and server segment have relied on parallelism frameworks such as Intel Threading Building Blocks, Cilk and OpenMP. These parallelism frameworks, however, are optimized for decent average case performance and consequently, do not meet the strict requirements imposed by real-time *** this paper, we present a proof-of-concept parallelism framework which was implemented in particular for soft real-time systems and having tight timing and safety requirements of such critical systems in mind. The proposed runtime system implements static memory allocation in a work-stealing environment that conforms to the strict space and tight probabilistic time bounds of work-stealing schedulers. Furthermore, we evaluate the performance of this framework by conducting multiprogrammed benchmarks on a real-time embedded multicore architecture.

关键词： Real-time parallel programming Work-stealing Thread pool Task model

Shared memory parallelism in Modern C++ and HPX

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Diehl, Patrick Brandt, Steven R. Kaiser, Hartmut Center of Computation & Technology Louisiana State University Digital Media Center Baton RougeLA70803 United States Department of Physics and Astronomy Louisiana State University Street Baton RougeLA70803 United States

parallel programming remains a daunting challenge, from struggling to express a parallel algorithm without cluttering the underlying synchronous logic to describing which devices to employ to calculate correctness. Over the years, numerous solutions have arisen, requiring new programming languages, extensions to programming languages, or adding pragmas. Support for these various tools and extensions is available to varying degrees. In recent years, the C++ standards committee has worked to refine the language features and libraries needed to support parallel programming on a single computational node. Eventually, all major vendors and compilers will provide robust and performant implementations of these standards. Until then, the HPX library and runtime provide cutting-edge implementations of the standards and proposed standards and extensions. Because of these advances, it is now possible to write high performance parallel code without custom extensions to C++. We provide an overview of modern parallel programming in C++, describing the language and library features and providing brief examples of how to use them. Copyright © 2023, The Authors. All rights reserved.

关键词： parallel programming

The Italian research on HPC key technologies across EuroHPC 21

学校读者我要写书评

暂无评论

The Italian research on HPC key technologies across EuroHPC

18th ACM International Conference on Computing Frontiers 2021, CF 2021

作者： Aldinucci, Marco Agosta, Giovanni Andreini, Antonio Ardagna, Claudio A. Bartolini, Andrea Cilardo, Alessandro Cosenza, Biagio Danelutto, Marco Esposito, Roberto Fornaciari, William Giorgi, Roberto Lengani, Davide Montella, Raffaele Olivieri, Mauro Saponara, Sergio Simoni, Daniele Torquati, Massimo Di University of Torino Cini HPC-KTT Laboratory Torino Italy Deib Politecnico di Milano Cini HPC-KTT Laboratory Milano Italy Dief University of Florence Cini HPC-KTT Laboratory Firenze Italy Università Degli Studi di Milano Cini HPC-KTT Laboratory Milano Italy Dei Università di Bologna Cini HPC-KTT Laboratory Bologna Italy University of Naples Federico Ii Cini HPC-KTT Laboratory Napoli Italy University of Salerno Cini HPC-KTT Laboratory Salerno Italy University of Pisa Cini HPC-KTT Laboratory Pisa Italy Diism University of Siena Cini HPC-KTT Laboratory Siena Italy Dime University of Genova Cini HPC-KTT Laboratory Genova Italy DiST University of Naples Parthenope Cini HPC-KTT Laboratory Napoli Italy Sapienza University of Rome Cini HPC-KTT Laboratory Roma Italy DII-University of Pisa Cini HPC-KTT Laboratory Pisa Italy

ISBN: (纸本)9781450384049

High-Performance Computing (HPC) is one of the strategic priorities for research and innovation worldwide due to its relevance for industrial and scientific applications. We envision HPC as composed of three pillars: infrastructures, applications, and key technologies and tools. While infrastructures are by construction centralized in large-scale HPC centers, and applications are generally within the purview of domain-specific organizations, key technologies fall in an intermediate case where coordination is needed, but design and development are often decentralized. A large group of Italian researchers has started a dedicated laboratory within the National Interuniversity Consortium for Informatics (CINI) to address this challenge. The laboratory, albeit young, has managed to succeed in its first attempts to propose a coordinated approach to HPC research within the EuroHPC Joint Undertaking, participating in the calls 2019 - 20 to five successful proposals for an aggregate total cost of 95M€. In this paper, we outline the working group's scope and goals and provide an overview of the five funded projects, which become fully operational in March 2021, and cover a selection of key technologies provided by the working group partners, highlighting their usage development within the projects. © 2021 ACM.

关键词： parallel programming

Multi-prediction metropolis hastings resampling filtering algorithm based on CUDA

学校读者我要写书评

暂无评论

MICROPROCESSORS AND MICROSYSTEMS 2022年 93卷

作者： Huang, Kaijie Cao, Jie Lanzhou Univ Technol Lanzhou Peoples R China

Aiming at the problems of accuracy, speed reduction and estimation accuracy loss caused by MH instead of sequential importance resampling in Metropolis Hastings resampling particle filter algorithm, this paper proposes a parallel Metropolis hasting filter algorithm based on a multi-prediction framework, which loads particles Filtering shifts from resampling to prediction and update steps. The overhead of the Multi-prediction framework can be easily compensated by parallel implementation. This algorithm reduces global sequential operations by adding local parallel computing. Simulation experiments prove that the real-time performance and state estimation accuracy of this method have been improved.

关键词： CUDA parallel architecture parallel programming Multi-prediction model Particle filter

Safety Hints for HTM Capacity Abort Mitigation

学校读者我要写书评

暂无评论

Safety Hints for HTM Capacity Abort Mitigation

IEEE Symposium on High-Performance Computer Architecture

作者： Anirudh Jain Divya Kiran Kadiyala Alexandros Daglis School of Computer Science Georgia Institute of Technology Atlanta Georgia USA School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta Georgia USA

Hardware Transactional Memory (HTM) is a high-performance instantiation of the powerful programming abstraction of transactional memory, which simplifies the daunting— yet critically important—task of parallel programming. While many HTM implementations with variable complexity exist in the literature, commercially available HTMs impose rigid restrictions to transaction and system behavior, limiting their practical use. A key constraint is the limited size of supported transactions, implicitly capped by hardware buffering capacity. We identify the opportunity to expand the effective capacity of these limited hardware structures by being more selective in memory accesses that need to be tracked. We leverage compiler and virtual memory support to identify safe memory accesses, which can never cause a transaction abort, subsequently passed as safety hints to the underlying HTM. With minor extensions over a conventional HTM implementation, HinTM uses these hints to selectively allocate transactional state tracking resources to unsafe accesses only, thus expanding the HTM’s effective capacity, and conversely reducing capacity aborts. We demonstrate that HinTM effectively augments the performance of a range of baseline HTM configurations. When coupled with a POWER8 HTM implementation, HinTM eliminates 64% of transactional capacity aborts, achieving 1.4× average speedup, and up to 8.7×.

关键词： Couplings Limiting parallel programming Memory management Performance gain Hardware Safety

A New Wave in HDFS Data Security: Merging AES & MapReduce for Efficient Data Encryption

学校读者我要写书评

暂无评论

A New Wave in HDFS Data Security: Merging AES & MapReduce fo...

International Carnahan Conference on Security Technology

作者： Yash Watarkar Avi Jain Dipesh Shah Aliasgar Thanawala Aparna Kamble School of CET Dr. Vishwanath Karad MIT World Peace University Pune India

ISBN: (数字)9798350315875

ISBN: (纸本)9798350315882

This paper proposes a robust encryption strategy for data protection within a Hadoop Distributed File System (HDFS) environment by integrating Advanced Encryption Standard (AES) and MapReduce. Leveraging the speed of the AES-128bit encryption algorithm in conjunction with the MapReduce parallel programming paradigm, the method achieves superior efficiency in the encryption of large amounts of crucial data. Furthermore, the implementation utilizes Phil Rogaway's XEX (Xor-Encrypt-Xor) XTS mode, which provides a robust defense against ciphertext manipulation and copy-and-paste attacks. This approach employs parallel mappers and reducers, known as AES-MR, to encrypt data chunks sequentially and concurrently. The paper demonstrates the efficacy and security of this method, suggesting it as a viable safety measure for safeguarding user-generated data in the HDFS context.

关键词： parallel programming File systems Merging Data protection Encryption Safety Standards

Degrees of Separation: A Flexible Type System for Data Race Prevention

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Xu, Yichen Odersky, Martin EPFL Switzerland

Data race is a notorious problem in parallel programming. There has been great research interest in type systems that statically prevent data races. Despite the progress in the safety and usability of these systems, lots of existing approaches enforce strict anti-aliasing principles to prevent data races. The adoption of them is often intrusive, in the sense that it invalidates common programming patterns and requires paradigm shifts. We propose Capture Separation Calculus (System CSC), a calculus based on Capture Calculus (System CC), that achieves static data race freedom while being non-intrusive. It allows aliasing in general to permit common programming patterns, but tracks aliasing and controls them when that is necessary to prevent data races. We study the formal properties of System CSC by establishing its type safety and data race freedom. Notably, we establish the data race freedom property by proving the confluence of its reduction semantics. To validate the usability of the calculus, we implement it as an extension to the Scala 3 compiler, and use it to type-check the examples. © 2023, CC BY.

关键词： parallel programming