检索结果-内蒙古大学图书馆

A performance-oriented comparative study of the Chapel high-productivity language to conventional programming environments 13

A performance-oriented comparative study of the Chapel high-...

引用

13th International Workshop on programming models and Applications for Multicores and Manycores (PMAM) part of PPoPP Conference

作者： Helbecque, Guillaume Gmys, Jan Carneiro, Tiago Melab, Nouredine Bouvry, Pascal Univ Lille CNRS CRIStAL UMR 9189 Inria Lille Nord Europe Lille France Univ Luxembourg FSTM Esch Sur Alzette Luxembourg Univ Luxembourg DCS FSTM SnT Luxembourg Luxembourg

ISBN: (纸本)9781450393393

The increase in complexity, diversity and scale of high performance computing environments, as well as the increasing sophistication of parallel applications and algorithms call for productivity-aware programming languages for high-performance computing. Among them, the Chapel programming language stands out as one of the more successful approaches based on the Partitioned Global Address Space programming model. Although Chapel is designed for productive parallel computing at scale, the question of its competitiveness with well-established conventional parallel programming environments arises. To this end, this work compares the performance of Chapel-based fractal generation on shared- and distributed-memory platforms with corresponding OpenMP and MPI+X implementations. The parallel computation of the Mandelbrot set is chosen as a test-case for its high degree of parallelism and its irregular workload. Experiments are performed on a cluster composed of 192 cores using the French national testbed Grid'5000. Chapel as well as its default tasking layer demonstrate high performance in shared-memory context, while Chapel competes with hybrid MPI+OpenMP in distributed-memory environment.

关键词： Chapel MPI Multi-core OpenMP Parallel computing Productivity-awareness

来源：评论

学校读者我要写书评

暂无评论

Enabling Bounded Verification of Doubly-Unbounded distributed Agreement-Based Systems via Bounded Regions

引用

proceedings OF THE ACM ON programming languages-PACMPL 2023年第OOPSLA期7卷 172-200页

作者： Wagner, Christopher Jaber, Nouraldin Samanta, Roopsha Purdue Univ Comp Sci W Lafayette IN 47907 USA

The ubiquity of distributed agreement protocols, such as consensus, has galvanized interest in verification of such protocols as well as applications built on top of them. The complexity and unboundedness of such systems, however, makes their verification onerous in general, and, particularly prohibitive for full automation. An exciting, recent breakthrough reveals that, through careful modeling, it becomes possible to reduce verification of interesting distributed agreement-based (DAB) systems, that are unbounded in the number of processes, to model checking of small, finite-state systems. It is an open question if such reductions are also possible for DAB systems that are doubly-unbounded, in particular, DAB systems that additionally have unbounded data domains. We answer this question in the affirmative in this work thereby broadening the class of DAB systems which can be automatically and efficiently verified. We present a novel reduction which leverages value symmetry and a new notion of data saturation to reduce verification of doubly-unbounded DAB systems to model checking of small, finite-state systems. We develop a tool, Venus, that can efficiently verify sophisticated DAB system models such as the arbitration mechanism for a consortium blockchain, a distributed register, and a simple key-value store.

关键词： Layered Verification Data Saturation Reduction

来源：评论

学校读者我要写书评

暂无评论

Enhancing Code Generation for Dataflow programming: Fine-Tuning Large Language models with the DFCPP Dataset 22

Enhancing Code Generation for Dataflow Programming: Fine-Tun...

引用

22nd IEEE International Symposium on Parallel and distributed Processing with Applications, ISPA 2024

作者： Luo, Qiuming Ma, Xi Wei, Xuan Shenzhen University College of Computer Science and Software Engineering Shenzhen China Tianjin Yuyi Kaihong Intelligent Technology Tianjin China

ISBN: (纸本)9798331509712

In recent years, large language models (LLMs) based on the Transformer architecture have demonstrated excellent performance in code generation, but there have been fewer studies on data flow languages. This study proposes a scheme for fine-tuning large language models based on the DFCPP dataset. We demonstrate the model's ability to generate dataflow graph (DAG) topologies and achieve significant performance improvements. Experimental results show that the BLEU score of the fine-tuned model in the DFCPP code generation task reaches 0.193, which is an increase of 112.1% compared to the non-fine-tuned model (0.091). This demonstrates the effectiveness of fine-tuning techniques in domain-specific code generation. © 2024 IEEE.

关键词： Data flow analysis

来源：评论

学校读者我要写书评

暂无评论

An Axiomatic Basis for Computer programming on the Relaxed Arm-A Architecture: The AxSL Logic

引用

proceedings OF THE ACM ON programming languages-PACMPL 2024年第POPL期8卷 604-637页

作者： Hammond, Angus Liu, Zongyuan Perami, Thibaut Sewell, Peter Birkedal, Lars Pichon-Pharabod, Jean Univ Cambridge Comp Lab JJ Thomson Ave Cambridge CB3 0FD England Aarhus Univ Abogade 34 DK-8200 Aarhus Denmark

Very relaxed concurrency memory models, like those of the Arm-A, RISC-V, and IBM Power hardware architectures, underpin much of computing but break a fundamental intuition about programs, namely that syntactic program order and the reads-from relation always both induce order in the execution. Instead, out-of-order execution is allowed except where prevented by certain pairwise dependencies, barriers, or other synchronisation. This means that there is no notion of the 'current' state of the program, making it challenging to design (and prove sound) syntax-directed, modular reasoning methods like Hoare logics, as usable resources cannot implicitly flow from one program point to the next. We present AxSL, a separation logic for the relaxed memory model of Arm-A, that captures the fine-grained reasoning underpinning the low-overhead synchronisation mechanisms used by high-performance systems code. In particular, AxSL allows transferring arbitrary resources using relaxed reads and writes when they induce inter-thread ordering. We mechanise AxSL in the Iris separation logic framework, illustrate it on key examples, and prove it sound with respect to the axiomatic memory model of Arm-A. Our approach is largely generic in the axiomatic model and in the instruction-set semantics, offering a potential way forward for compositional reasoning for other similar models, and for the combination of production concurrency models and full-scale ISAs.

关键词： relaxed memory models program logic separation logic Arm Iris

来源：评论

学校读者我要写书评

暂无评论

Pattern-oriented API Refactoring: Addressing Design Smells and Stakeholder Concerns 24

Pattern-oriented API Refactoring: Addressing Design Smells a...

引用

29th European Conference on Pattern languages of Programs, People, and Practices

作者： Stocker, Mirko Zimmermann, Olaf Kapferer, Stefan Eastern Switzerland Univ Appl Sci OST St Gallen Switzerland

ISBN: (纸本)9798400716836

In distributed systems, remote Application programming Interfaces (APIs) let architectural components such as microservices communicate with each other;interoperability and satisfactory developer experience are key stakeholder concerns. In response to changing requirements and insights from development and operations, API endpoints and the request and response messages of the exposed operations are actively designed and then modified during the entire life cycle of the system. Refactoring is a crucial practice in agile software development, widely adopted in practice at the code level. Architectural refactoring has been researched but has not been adopted nearly as widely as code-level refactoring. This paper continues our work on refactoring remote APIs, which we introduced at EuroPLoP 2023. We present a second slice of seven API refactorings pulled from our online Interface Refactoring Catalog, many of which target API design patterns: Extract Information Holder, Inline Information Holder, Extract Operation, Rename Operation, Make Request Conditional, Encapsulate Context Representation, and Introduce Version Identifier. Besides context, problem, and step-by-step solution, we also motivate the refactorings by stakeholder concerns and identify the design smells that refactoring can address. All refactorings are illustrated with implementation code snippets, excerpts from API specification, and/or examples of messages exchanged at runtime. The paper concludes with an outlook to future work.

关键词： application programming interface cloud computing design patterns enterprise application integration interface definition languages refactoring software

来源：评论

学校读者我要写书评

暂无评论

Towards Safe HPC: Productivity and Performance via Rust Interfaces for a distributed C plus plus Actors Library (Work in Progress) 20

Towards Safe HPC: Productivity and Performance via Rust Inte...

引用

20th ACM SIGPLAN International Conference on Managed programming languages and Runtimes (MPLR)

作者： Parrish, John Wren, Nicole Kiang, Tsz Hang Hayashi, Akihiro Young, Jerey Sarkar, Vivek Georgia Inst Technol Atlanta GA 30332 USA Block Franklin Pk IL USA

ISBN: (纸本)9798400703805

In this work-in-progress research paper,we make the case for using Rust to develop applications in the High Performance computing (HPC) domain which is critically dependent on native C/C++ libraries. This work explores one example of Safe HPC via the design of a Rust interface to an existing distributed C++ Actors library. This existing library has been shown to deliver high performance to C++ developers of irregular Partitioned Global Address Space (PGAS) applications. Our key contribution is a proof-of-concept framework to express parallel programs safely in Rust (and potentially other languages/systems), along with a corresponding study of the problems solved by our runtime, the implementation challenges faced, and user productivity. We also conducted an early evaluation of our approach by converting C++ actor implementations of four applications taken from the Bale kernels to Rust Actors using our framework. Our results show that the productivity benefits of our approach are significant since our Rust-based approach helped catch bugs statically during application development, without degrading performance relative to the original C++ actor versions.

关键词： Rust high performance computing actors unsafe annotations

来源：评论

学校读者我要写书评

暂无评论

Preface for the special issue on selected software artifacts from DisCoTec 2023-the 18th International Federated Conference on distributed computing Techniques

引用

SCIENCE OF COMPUTER programming 2025年 243卷

作者： Casadei, Roberto Cogo, Vinicius Vielmo van Dijk, Tom Scalas, Alceste Alma Mater Studiorum Univ Bologna Bologna Italy Univ Lisbon Lisbon Portugal Univ Twente Twente Netherlands Tech Univ Denmark Lyngby Denmark

This special issue includes a selection of the artefacts presented at the 18th International Federated Conference on distributed computing Techniques (DiScoTec 2023), held at the NOVA University Lisbon (Lisbon, Portugal), in June 18-23, 2023. The federated conference included: COORDINATION 2023, the 25th International Conference on Coordination models and languages);DAIS 2023, the 23rd International Conference on distributed Applications and Interoperable Systems;and FORTE 2023, the 43rd International Conference on Formal Techniques for distributed Objects, Components, and Systems. All the three conferences welcomed submissions describing technological artefacts, including innovative prototypes supporting the modelling, development, analysis, simulation, or testing of systems in the broad spectrum of distributed computing subjects. The artefact evaluation chairs have selected a subset of high- quality accepted artefacts to be invited for submission to this special issue. Following the revision process, nine artefacts have been accepted to be part of this special issue. The published contributions include different types of artefacts, including programming libraries, frameworks, as well as tools for the analysis, verification, and simulation of distributed systems.

关键词： distributed computing distributed systems distributed applications Coordination Formal methods programming languages Software Tools Simulators Analysis Model checkers Machine learning

来源：评论

学校读者我要写书评

暂无评论

A MultiGPU Performance-Portable Solution for Array programming Based on Kokkos 9

A MultiGPU Performance-Portable Solution for Array Programmi...

引用

9th ACM SIGPLAN International Workshop on Libraries, languages and Compilers for Array programming (ARRAY)

作者： Valero-Lara, Pedro Vetter, Jeffrey S. Oak Ridge Natl Lab Oak Ridge TN 37830 USA

ISBN: (纸本)9798400701696

MultiGPU nodes are widely used in high-performance computing and data centers. However, current programming models do not provide transparent, and portable support for automatically targeting multiple GPUs within a node. In this paper, we describe a new application programming interface based on the Kokkos programming model to enable array computation on multiple GPUs in a transparent and portable way across both NVIDIA and AMD GPUs. We implement different variations of this technique to accommodate the exchange of stencils, and we provide autotuning to select the proper number of GPUs, depending on the computational cost of the operations to be computed on arrays. We evaluate our multiGPU extension on Summit (#5 TOP500), with six NVIDIA V100 Volta GPUs per node, and Crusher that contains identical hardware/software as Frontier (#1 TOP500), with four AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node. We also compare the performance of this solution against the use of MPI + Kokkos. Our evaluation shows that the new Kokkos solution provides good scalability for many GPUs. than MPI + Kokkos.

关键词： Array programming C plus plus metaprogramming multiGPU performance portability productivity autotuning parallel programming model Kokkos MPI CUDA HIP

来源：评论

学校读者我要写书评

暂无评论

Parsimony: Enabling SIMD/Vector programming in Standard Compiler Flows 2023

Parsimony: Enabling SIMD/Vector Programming in Standard Comp...

引用

21st ACM/IEEE International Symposium on Code Generation and Optimization (CGO)

作者： Kandiah, Vijay Lustig, Daniel Villa, Oreste Nellans, David Hardavellas, Nikos Northwestern Univ Evanston IL 60208 USA NVIDIA San Jose CA USA

ISBN: (纸本)9798400701016

Achieving peak throughput on modern CPUs requires maximizing the use of single-instruction, multiple-data (SIMD) or vector compute units. Single-program, multiple-data (SPMD) programming models are an effective way to use high-level programming languages to target these ISAs. Unfortunately, many SPMD frameworks have evolved to have either overly-restrictive language specifications or under-specified programming models, and this has slowed the widescale adoption of SPMD-style programming. This paper introduces Parsimony (PARallel SIMd), a SPMD programming approach built with semantics designed to be compatible with multiple languages and to cleanly integrate into the standard optimizing compiler toolchains for those languages. We first explain the Parsimony programming model semantics and how they enable a standalone compiler IR-to-IR pass that can perform vectorization independently of other passes, improving the language and toolchain compatibility of SPMD programming. We then demonstrate a LLVM prototype of the Parsimony approach that matches the performance of ispc, a popular but more restrictive SPMD approach, and achieves 97% of the performance of hand-written AVX-512 SIMD intrinsics on over 70 benchmarks ported from the Simd Library. We finally discuss where Parsimony has exposed parts of existing language and compiler flows where slight improvements could further enable improved SPMD program vectorization.

关键词： Parallel computing Vectorization Code Translation Single-instruction Multiple-data Compiler Design

来源：评论

学校读者我要写书评

暂无评论

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

Evaluating performance and portability of high-level program...

引用

37th IEEE International Parallel and distributed Processing Symposium (IPDPS)

作者： Godoy, William F. Valero-Lara, Pedro Dettling, T. Elise Trefftz, Christian Jorquera, Ian Sheehy, Thomas Miller, Ross G. Gonzalez-Tallada, Marc Vetter, Jeffrey S. Churavy, Valentin Oak Ridge Natl Lab Oak Ridge TN 37830 USA MIT Cambridge MA 02139 USA

ISBN: (纸本)9798350311990

We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership computing Facilities. We compare the default performance of a hand-rolled dense matrix multiplication algorithm on CPUs against vendor-compiled C/OpenMP implementations, and on each GPU against CUDA and HIP. Rather than focusing on the kernel optimization per-se, we select this naive approach to resemble exploratory work in science and as a lower-bound for performance to isolate the effect of each programming model. Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with CUDA and HIP on GPUs. Performance gaps are identified on NVIDIA A100 GPUs for Julia's single precision and Kokkos, and for Python/Numba in all scenarios. We also comment on half-precision support, productivity, performance portability metrics, and platform readiness. We expect to contribute to the understanding and direction for high-level, high-productivity languages in HPC as the first-generation exascale systems are deployed.

关键词： Julia Python/Numba Kokkos OpenMP LLVM Performance Portability HPC Exascale GPU

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：