检索结果-内蒙古大学图书馆

IEEE Symposium on Computational Intelligence and Games, CIG

作者： Jackson Froehlich Britton Horn Department of Computer Science Trinity University San Antonio USA

ISBN: (数字)9798350350678

ISBN: (纸本)9798350350685

Game designers commonly use paper prototyping to evaluate educational effectiveness, enjoyment, flow, and usability, while also reducing costs and exploring alternative implementations. However, creating a paper prototype that yields actionable feedback can be challenging due to the wide range of methods available, from low-fidelity sketches to high-fidelity mockups. Most paper prototypes are static and progress discretely, making it difficult to prototype physics-based games effectively, unless focusing on interfaces, narrative, or underlying systems. This paper details the creation of three prototypes of varying fidelity and metaphor for a physics-based educational game on concurrency and parallel programming. Each prototype undergoes playtesting to assess construction methods and their effectiveness in gathering player feedback.

关键词： Visualization Costs parallel programming Prototypes Focusing Games Production

来源：评论

学校读者我要写书评

暂无评论

HARNESSING THE POWER OF LLMS: AUTOMATING UNIT TEST GENERATION FOR HIGH-PERFORMANCE COMPUTING

arXiv

引用

arXiv 2024年

作者： Karanjai, Rabimba Hussain, Aftab Rabin, Md Rafiqul Islam Xu, Lei Shi, Weidong Alipour, Mohammad Amin Department of Computer Science University Of Houston United States Department of Computer Science Kent State University United States

Unit testing is a standard practice in software engineering and is critical for ensuring software quality. However, for parallel and high-performance computing software, especially scientific computing applications, unit testing is not widely implemented. Compared with typical commercial software, high performance software usually have a smaller user base, and they are diverse and usually involve complex logic. These characteristics create several challenges for conducting unit testing for parallel and high performance software. On one hand, it is economically expensive to have a dedicated testing team to do unit testing considering the number of users. On the other hand, it is hard for a quality engineer without domain knowledge to design effective unit testings. Similarly, existing automated unit testing tools are usually not effective for such software. Therefore, it is vital to devise an automated method for generating unit testing cases for parallel and high performance software, which considers the unique features of these software, including complex logic and sophisticated parallel processing techniques. Recently, large language models (LLMs) have attracted more attention and are believed to be a powerful tool for coding and testing, but its application in producing unit tests for parallel and high performance applications remains uncertain. To fill this gap, we explore the capabilities of two well-known generative models, Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo), in crafting unit testing cases for parallel and high performance software. Specifically, we proposed novel ways to utilize LLMs to develop unit testing cases for high performance software with C++ parallel programs and assessed their effectiveness on extensive OpenMP/MPI projects. Our findings indicate that in the context of parallel programming, LLMs can create unit testing cases that are mostly syntactically correct and offer substantial coverage, while they exhibit some limitations

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

ACCELERATING A TRITON FUSED KERNEL FOR W4A16 QUANTIZED INFERENCE WITH SPLITK WORK DECOMPOSITION

arXiv

引用

arXiv 2024年

作者： Hoque, Adnan Wright, Less Yang, Jamie Srivatsa, Mudhakar Ganti, Raghu IBM T.J. Watson Research Center Yorktown HeightsNY United States Meta AI United States

We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multiplication between a skinny activation matrix and a square weight matrix. Our results show an average of 65% speed improvement on A100, and an average of 124% speed improvement on H100 (with a peak of 295%) for a range of matrix dimensions including those found in a llama-style model, where m © 2024, CC BY.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel chance-constrained dynamic programming for cascade hydropower system operation

引用

ENERGY 2018年第PartA期165卷 752-767页

作者： Liu, Benxi Cheng, Chuntian Wang, Sen Liao, Shengli Chau, Kwok-Wing Wu, Xinyu Li, Weidong Dalian Univ Technol Inst Hydropower & Hydroinformat Dalian 116024 Peoples R China Dalian Univ Technol Elect Engn Postdoctoral Stn Dalian 116024 Peoples R China Dalian Univ Technol Minist Educ Key Lab Ocean Energy Utilizat & Energy Conservat Dalian 116024 Peoples R China Pearl River Hydraul Res Inst Guangzhou 510611 Guangdong Peoples R China Hong Kong Polytech Univ Dept Civil & Environm Engn Hong Kong Hong Kong Peoples R China Dalian Univ Technol Sch Elect Engn Dalian 116024 Peoples R China

With continuing development of hydropower in China, cascade hydropower system will account for more in the power grid, and may increase power grid operation risk under global climate change. This paper presents a parallel chance-constrained dynamic programming model to derive optimal operating policies for a cascade hydropower system in China. The innovation work of this paper is mainly embodied in two aspects. First, the reliabilities of meeting the firm power requirements of the cascade hydropower system and avoiding extreme system failure under extreme events are explicitly embedded in the model using Lagrangian duality theory and a penalty function. Multiple operating policies are generated by updating the values of Lagrangian multiplier and penalty coefficient for system disruption, then best operating rules are selected based on system performance and evaluated according to simulated reliability, extreme system failure, and maximum benefit. Second, the Fork/Join parallel framework is deployed to parallelize the chance-constrained dynamic programming in a multi-core environment for improving computational efficiency. Two computing platforms with contrasting configurations are employed to illustrate the parallelization performance. Results from a cascade hydropower system operation demonstrate that the proposed method is computationally efficient and can obtain satisfying operating policies, especially for extreme drought events. (C) 2018 Elsevier Ltd. All rights reserved.

关键词： Chance-constrained dynamic programming parallel programming Reliability Extreme system failure Cascade hydropower system

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous computing in a strongly-connected CPU-GPU environment: fast multiple time-evolution equation-based modeling accelerated using data-driven approach

Heterogeneous computing in a strongly-connected CPU-GPU envi...

引用

High Performance Computing, Networking, Storage and Analysis, SC-W: Workshops of the International Conference for

作者： Tsuyoshi Ichimura Kohei Fujita Muneo Hori Lalith Maddegedara Jack Wells Alan Gray Ian Karlin John Linford Earthquake Research Institute The University of Tokyo RIKEN Tokyo Japan Research Institute for Value-Added-Information Generation Japan Agency for Marine-Earth Science and Technology Yokohama Japan Earthquake Res. Inst. The University of Tokyo Tokyo Japan NVIDIA Santa Clara CA USA

ISBN: (数字)9798350355543

ISBN: (纸本)9798350355550

We propose a CPU-GPU heterogeneous computing method for solving time-evolution partial differential equation problems many times with guaranteed accuracy, in short time-to-solution and low energy-to-solution. On a single-GH200 node, the proposed method improved the computation speed by 86.4 and 8.67 times compared to the conventional method run only on CPU and only on GPU, respectively. Furthermore, the energy-to-solution was reduced by 32.2-fold (from 9944 J to 309 J) and 7.01-fold (from 2163 J to 309 J) when compared to using only the CPU and GPU, respectively. Using the proposed method on the Alps supercomputer, a 51.6-fold and 6.98-fold speedup was attained when compared to using only the CPU and GPU, respectively, and a high weak scaling efficiency of 94.3% was obtained up to 1,920 compute nodes. These implementations were realized using directive-based parallel programming models while enabling portability, indicating that directives are highly effective in analyses in heterogeneous computing environments.

关键词： Analytical models parallel programming Computational modeling Partial differential equations High performance computing Conferences Graphics processing units Mathematical models Heterogeneous networks Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Parsl+CWL: Towards Combining the Python and CWL Ecosystems

Parsl+CWL: Towards Combining the Python and CWL Ecosystems

引用

High Performance Computing, Networking, Storage and Analysis, SC-W: Workshops of the International Conference for

作者： Nishchay Karle Ben Clifford Yadu Babuji Ryan Chard Daniel S. Katz Kyle Chard Department of Computer Science University of Chicago Chicago USA NCSA & CS & iSchool University of Illinois Urbana-Champaign Urbana IL USA Argonne National Laboratory Lemont USA

ISBN: (数字)9798350355543

ISBN: (纸本)9798350355550

The Common Workflow Language (CWL) is a widely adopted language for defining and sharing computational workflows. It is designed to be independent of the execution engine on which workflows are executed. In this paper, we describe our experiences integrating CWL with Parsl, a Python-based parallel programming library designed to manage execution of workflows across diverse computing environments. We propose a new method that converts CWL CommandLineTool definitions into Parsl apps, enabling Parsl scripts to easily import and use tools represented in CWL. We describe a Parsl runner that is capable of executing a CWL CommandLineTool directly. We also describe a proof-of-concept extension to support inline Python in a CWL workflow definition, enabling seamless use in Parsl’s Python ecosystem. We demonstrate the benefits of this integration by presenting example CWL CommandLineTool definitions that show how they can be used in Parsl, and comparing performance of executing an image processing workflow using the Parsl integration and other CWL runners.

关键词： parallel programming Image processing High performance computing Conferences Ecosystems Libraries Engines Python

来源：评论

学校读者我要写书评

暂无评论

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos ...

引用

International Conference on High Performance Computing

作者： Rahulkumar Gayatri Shilei Tian Stephen L. Olivier Eric Wright Johannes Doerfert NERSC Lawrence Berkeley National Laboratory Berkeley USA Institute for Advanced Computational Science Stony Brook University Stony Brook NY USA Center for Computing Research Sandia National Laboratories Albuquerque NM USA Livermore Computing Lawrence Livermore National Laboratory Livermore CA USA Center for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore CA USA

ISBN: (数字)9798331509095

ISBN: (纸本)9798331509101

OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified some impediments to performance with this approach arising from limitations in the API or in the available implementations. Advanced programming concepts such as hierarchical parallelism and use of dynamic shared memory were a particular area of concern. In this paper, we apply recent improvements and extensions in the LLVM/Clang OpenMP compiler and runtime library to the Kokkos backend that targets GPUs via OpenMP offload. We focus on efficient hierarchical parallelism and use of fast GPU scratch memory. We compare the performance of applications written using the Kokkos library with this improved OpenMP backend against the same programs using the CUDA and HIP backends. This evaluation shows progress toward closing the performance gaps between native and OpenMP backends and offers insights that may be useful to users and implementers of other runtime systems and programming frameworks for GPUs.

关键词： Runtime Runtime library parallel programming Memory management Graphics processing units C++ languages parallel processing Libraries Optimization Tuning

来源：评论

学校读者我要写书评

暂无评论

GORC: A Graph Neural Network Based Static Data Race Checker for OpenMP

GORC: A Graph Neural Network Based Static Data Race Checker ...

引用

ISC High Performance 2025 Research Paper Proceedings (40th International Conference)

作者： Anh Tran Ignacio Laguna Konstantinos Parasyris Giorgis Georgakoudis Ganesh Gopalakrishnan Kahlert School of Computing University of Utah Utah USA Lawrence Livermore National Laboratory Center for Applied Scientific Computing California USA

ISBN: (数字)9783982633619

There is an unmet need for static data race checkers that can analyze incomplete programs typical of early program development stages, and are also easily to adapt to different parallel programming models. In this work, we present a novel race checking approach based on Graph Neural Networks (GNN) called GORC that has these attributes. GORC is trained on PrograML control/data graph representations extracted from OpenMP programs that are labeled as racy or race-free, and helps predict races in unseen OpenMP programs. We provide a detailed evaluation of GORC, demonstrating that our approach can deliver high accuracy while also handling many more programs than existing static race checkers. Despite the scarcity of training data, GORC achieves a higher recall rate than LLOV, a widely cited static race checker for OpenMP. It outperforms state-of-the-art ML-based techniques for OpenMP data race detection on three different data-sets. This paper describes GORC's architecture, detailed evaluations, and a novel attribution study that confirms that GORC is learning features relevant to producing data race classifications.

关键词： Adaptation models Analytical models Accuracy parallel programming Training data Detectors Benchmark testing Graph neural networks Data models

来源：评论

学校读者我要写书评

暂无评论

Compiler-Aided Correctness Checking of CUDA-Aware MPI Applications

Compiler-Aided Correctness Checking of CUDA-Aware MPI Applic...

引用

High Performance Computing, Networking, Storage and Analysis, SC-W: Workshops of the International Conference for

作者： Alexander Hück Tim Ziegler Simon Schwitanski Joachim Jenke Christian Bischof Technical University Darmstadt Darmstadt Germany RWTH Aachen University Aachen Germany

ISBN: (数字)9798350355543

ISBN: (纸本)9798350355550

Hybrid MPI + X models, combining the Message Passing Interface (MPI) with node-level parallel programming models, increase complexity and introduce additional correctness issues. This work addresses the challenges of detecting data races in hybrid CUDA-aware MPI applications due to the asynchronous and non-blocking nature of CUDA and MPI APIs. We introduce CuSan, an LLVM compiler extension, and runtime that tracks CUDA-specific concurrency, synchronization, and memory access semantics. We integrate CuSan with MUST, a dynamic MPI correctness tool, and ThreadSanitizer (TSan), a thread-level data race detector. MUST with TSan can already detect concurrency issues for multi-threaded MPI codes. Together with CuSan, these tools allow for comprehensive correctness checking of concurrency issues in CUDA-aware MPI applications. Our evaluation of two mini-apps reveals runtime overhead of CuSan ranging from 6× to 36×, depending on the amount of memory tracked by TSan, compared to the uninstrumented version. Memory overhead consistently remains under 1.8×. CuSan is available at https://***/tudasc/cusan.

关键词： Concurrent computing Runtime parallel programming Semantics Memory management Graphics processing units Optical fiber networks Optical fiber devices Synchronization Message systems

来源：评论

学校读者我要写书评

暂无评论

mpiPython: Extensions of Collective Operations

mpiPython: Extensions of Collective Operations

引用

International Conference on Information and Computer Technologies (ICICT)

作者： Judah Nava Jaden Jinu Lee Hanku Lee Computer Science and Information Systems Minnesota State University Moorhead Moorhead MN

ISBN: (数字)9798350385625

ISBN: (纸本)9798350385632

Despite performance limitations due to its interpreted nature, Python remains a dominant language among scientists and engineers. Enhancing its capabilities for parallel programming unlocks significant potential within parallel and cloud computing environments. mpiPython, a Python binding for message-passing interfaces, empowers Python for Single Program Multiple Data (SPMD) execution, enabling efficient parallel computations. Additionally, Python's inherent accessibility and versatility foster a growing demand for scaling and parallelizing it on distributed cloud environments. This paper extends mpiPython, bridging the gap in collective operations for parallel computing. The extension builds upon the original mpiPython's class-based structure, emphasizing two core principles: supporting vanilla Python with MPI and focusing on a C-based CPU-focused implementation. Unlike existing implementations like mpi4py, mpiPython directly interacts with the Python C API, offering greater control. Two new functions, MPI Gather and MPI Reduce, significantly improve efficiency and streamline collective operations between working nodes. The results demonstrate mpiPython's ability to perform at the level of other libraries while prioritizing a simple implementation accessible to a broad range of users.

关键词： Cloud computing Technological innovation Reviews parallel programming parallel processing Performance gain Libraries

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：