检索结果-内蒙古大学图书馆

19th International Conference on Euro-Par

作者： Nanz, Sebastian West, Scott da Silveira, Kaue Soares Swiss Fed Inst Technol Zurich Switzerland Google Inc Zurich Switzerland

ISBN: (纸本)9783642400476

parallel programming is often regarded as one of the hardest programming disciplines. On the one hand, parallel programs are notoriously prone to concurrency errors;and, while trying to avoid such errors, achieving program performance becomes a significant challenge. As a result of the multicore revolution, parallel programming has however ceased to be a task for domain experts only. And for this reason, a large variety of languages and libraries have been proposed that promise to ease this task. This paper presents a study to investigate whether such approaches succeed in closing the gap between domain experts and mainstream developers. Four approaches are studied: Chapel, Cilk, Go, and Threading Building Blocks (TBB). Each approach is used to implement a suite of benchmark programs, which are then reviewed by notable experts in the language. By comparing original and revised versions with respect to source code size, coding time, execution time, and speedup, we gain insights into the importance of expert knowledge when using modern parallel programming approaches.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

LASSI: An LLM-Based Automated Self-Correcting Pipeline for Translating parallel Scientific Codes

LASSI: An LLM-Based Automated Self-Correcting Pipeline for T...

引用

IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)

作者： Matthew T. Dearing Yiheng Tao Xingfu Wu Zhiling Lan Valerie Taylor University of Illinois Chicago USA Argonne National Laboratory USA

ISBN: (数字)9798350383454

ISBN: (纸本)9798350383461

This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework called LASSI, designed to translate between parallel programming languages by bootstrapping existing closed- or open-source LLMs. LASSI incorporates autonomous enhancement through self-correcting loops where errors encountered during the compilation and execution of generated code are fed back to the LLM through guided prompting for debugging and refactoring. We highlight the bidirectional translation of existing GPU benchmarks between OpenMP target offload and CUDA to validate LASSI. The results of evaluating LASSI with different application codes across four LLMs demonstrate the effectiveness of LASSI for generating executable parallel codes, with 80% of OpenMP to CUDA translations and 85% of CUDA to OpenMP translations producing the expected output. We also observe approximately 78% of OpenMP to CUDA translations and 62% of CUDA to OpenMP translations execute within 10% of or at a faster runtime than the original benchmark code in the same language.

关键词： Codes Runtime parallel programming Conferences Large language models Pipelines Graphics processing units Training data Debugging Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

Implementation of Longest Common Subsequence Algorithm Using Thread parallelization in Java

Implementation of Longest Common Subsequence Algorithm Using...

引用

International Conference on Business and Industrial Research (ICBIR)

作者： Mark Phil B. Pacot Gleen A. Dalaorao Department of Computer Science Caraga State University Caraga Region Philippines Department of Information Technology Caraga State University Caraga Region Philippines

ISBN: (数字)9798350383027

ISBN: (纸本)9798350383034

This sequence alignment stands as a pivotal method in the realm of bioinformatics, meticulously employed to ascertain the degree of similarity between diverse sequences such as DNA, RNA, and amino acids. Among the myriad techniques utilized in tackling sequence alignment challenges, the Longest Common Subsequence (LCS) takes center stage. This paper delves into the realm of enhancing LCS efficiency through the implementation of thread parallelization. Drawing inspiration from the seminal work of Wagner and Fischer in 1974, both sequential and parallel techniques exhibit remarkable consistency in identifying the maximum length of LCS. However, this research goes a step further by introducing thread parallelization, which leverages multithreading, resource synchronization, and task decomposition within the domain of parallel programming. The meticulous integration of these advanced techniques results in a notable enhancement in terms of running time compared to the conventional iterative sequential approach. The experimentation and evaluation of both sequential and parallel approaches were conducted using Netbeans, a robust Integrated Development Environment (IDE) tailored for the Java programming Language. The findings underscore the superior performance of the thread parallelization strategy, establishing its prowess in optimizing the execution time of LCS problem resolution.

关键词： Java parallel programming Multithreading Instruction sets RNA Synchronization Bioinformatics parallel algorithms Standards Optimization

来源：评论

学校读者我要写书评

暂无评论

Exploring Fine-grained Task parallelism on Simultaneous Multithreading Cores

arXiv

引用

arXiv 2024年

作者： Los, Denis Petushkov, Igor Moscow Institute of Physics and Technology 9 Institutskiy per. Moscow Region Dolgoprudny141700 Russia

Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in such systems has not been well researched in previous works. Hence, in this paper, we conduct performance analysis of state-of-the-art shared-memory parallel programming frameworks on simultaneous multithreading cores using real-world fine-grained application kernels. We introduce a specialized and simple software-only parallel programming framework called Relic to enable extremely fine-grained tasking on simultaneous multithreading cores. Using Relic framework, we increase performance speedups over serial implementations of benchmark kernels by 19.1% compared to LLVM OpenMP, by 31.0% compared to GNU OpenMP, by 20.2% compared to Intel OpenMP, by 33.2% compared to X-OpenMP, by 30.1% compared to oneTBB, by 23.0% compared to Taskflow, and by 21.4% compared to OpenCilk. © 2024, CC BY.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Touch2C: A code conversion method from programming language for swarm intelligent building to C language

Touch2C: A code conversion method from programming language ...

引用

Chinese Control and Decision Conference, CCDC

作者： Wenjie Chen Qiliang Yang Jianchun Xing Shuo Zhao Chenxi Hu Chao Mou College of Defense Engineering Army Engineering University of PLA Nanjing China China Xi’an Satellite Control Center Xi’an China

ISBN: (数字)9798350387780

ISBN: (纸本)9798350387797

The Touch programming language for swarm intelligent building application (APP) development effectively reduces the development difficulty and user programming threshold, making the building more intelligent. However, the features of Touch language such as intuitive modeling of building elements, parallel programming, and the implicit specification of internode communication lead to great challenges in the compilation process of Touch language to the low-level executable object code of swarm intelligent buildings, and the APP development efficiency is not high. This paper proposes a code conversion method from Touch to C language and its supporting tools, designs code conversion algorithms for Touch language elements used to describe distributed building physical objects and parallel computing mode, which supports the automatic conversion of high-level Touch language, which is user-oriented and shielded from the details of the underlying interactions, into the C language code for underlying execution, thus realizing an integrated process from high-level APP development to low-level hardware platform execution and improving the APP development efficiency.

关键词： Codes parallel programming Buildings Semantics C languages Distributed databases parallel processing

来源：评论

学校读者我要写书评

暂无评论

An overview of parallel processing of rectangular determinant calculation

An overview of parallel processing of rectangular determinan...

引用

Mediterranean Conference on Embedded Computing (MECO)

作者： Besnik Duriqi Halil Snopçe Armend Salihu Artan Luma Faculty of Computer Science South East European University - SEEU Republic of North Macedonia Faculty of Computer Science Republic of North Macedonia Department of Computer Science UNI Universum International College Prishtina Republic of Kosovo

ISBN: (数字)9798350387568

ISBN: (纸本)9798350387575

This paper focuses on developing algorithms for parallel determinant processing, a crucial task in linear algebra and computational mathematics. The aim is to improve efficiency in high-performance computing environments by designing and analyzing algorithms that use parallel processing to expedite determinant computation for various matrices range. The research explores methods like Laplace expansion, LU decomposition, eigenvalue decomposition, Gaussian elimination, and cofactor expansion, assessing their efficiency, scalability, and applicability in different computational environments. The study employs advanced parallel programming techniques and architectures, utilizing multi-core processors with the focus aim into utilization of Chio’s method of rectangular determinants processing in parallel etc. The research also investigates the mathematical underpinnings of parallel determinant algorithms, addressing challenges like load balancing, data distribution, and synchronization. The results show significant improvements in determinant calculations efficiency, reducing computation times for large matrices.

关键词： Multicore processing parallel programming Scalability Signal processing algorithms Computer architecture Linear algebra parallel processing

来源：评论

学校读者我要写书评

暂无评论

Effective and Efficient Offloading Designs for One-Sided Communication to SmartNICs

Effective and Efficient Offloading Designs for One-Sided Com...

引用

International Conference on High Performance Computing

作者： Ben Michalowicz Kaushik Kandadi Suresh Hari Subramoni Mustafa Abduljabbar Dhabaleswar K. Panda Steve Poole Department of Computer Science and Engineering The Ohio State University Columbus USA Los Alamos National Laboratory

ISBN: (数字)9798331509095

ISBN: (纸本)9798331509101

One-sided communication is one of many approaches to use for data transfer in High-Performance Computing (HPC) applications. One-sided operations require less demand on parallel programming libraries and do not require HPC hardware to issue acknowledgments of successful data transfer. Thanks to its inherently non-blocking nature, one-sided communication is also useful for improving overlap between communication and compute. As with any non-blocking communication, however, we run into the issue of message progression getting interleaved with computation. With the advent of Smart Network Cards (SmartNIC) such as NVIDIA's BlueField Data Processing Units (DPU), we can offload the communication and message progression to these devices to improve the overlap of communication and compute. In this paper, we propose designs for efficient offloading of one-sided communication. We show how our designs can be used for offloading both MPI one-sided “put” and “get” and OpenSHMEM's non-blocking “put” and “get”. Using a Block Sparse Matrix-Multiplication Kernel (BSPMM), we show that our designs achieve over 96% improvement in runtime over pure-host execution for communication offload. We also briefly explore initial compute offload ideas for such one-sided kernels and show over 91% improvement in runtime here.

关键词： Runtime parallel programming High performance computing Memory management Data transfer Libraries Hardware Sparse matrices Kernel

来源：评论

学校读者我要写书评

暂无评论

UNR: Unified Notifiable RMA Library for HPC

arXiv

引用

arXiv 2024年

作者： Feng, Guangnan Xie, Jiabin Dong, Dezun Lu, Yutong Sun Yat-sen University Guangzhou China Changsha China

Remote Memory Access (RMA) enables direct access to remote memory to achieve high performance for HPC applications. However, most modern parallel programming models lack schemes for the remote process to detect the completion of RMA operations. Many previous works have proposed programming models and extensions to notify the communication peer, but they did not solve the multi-NIC aggregation, portability, hardware-software co-design, and usability problems. In this work, we proposed a Unified Notifiable RMA (UNR) library for HPC to address these challenges. In addition, we demonstrate the best practice of utilizing UNR within a real-world scientific application, PowerLLEL. We deployed UNR across four HPC systems, each with a different interconnect. The results show that PowerLLEL powered by UNR achieves up to a 36% acceleration on 1728 nodes of the Tianhe-Xingyi supercomputing system. Copyright © 2024, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Efficient Multicore Computing and High-Precision Arithmetic: A Comprehensive Guide to Multicore and Big Number programming

Efficient Multicore Computing and High-Precision Arithmetic:...

引用

International Conference on Communication Systems and Network Technologies (CSNT)

作者： Taniya Hasija K. R. Ramkumar Amanpreet Kaur Sudesh Kumar Mittal Bhupendra Singh Chitkara University Institute of Engineering and Technology Chitkara University Punjab India Centre for Artificial Intelligence & Robotics Defence Research and Development Organization Bangalore India

ISBN: (数字)9798350305463

ISBN: (纸本)9798350305470

Today system and application programming is moving toward concurrent and parallel programming with the development of multicore and multiprogramming architectures. In an effort to improve study performance, researchers are looking for more efficient methods to include multiprocessing and multicore programming into their simulation systems. This article provides an overview of multicore programming and illustrates how it can be implemented. The paper also focusing the limitations of primitive data types for diverse applications, especially in the context of computer systems. The article delves into the necessity of big numbers and arithmetic on a significant scale. Focusing on C programming, the article showcases the implementation of big numbers, providing scholars with a comprehensive understanding of the concept and its practical realization.

关键词： Knowledge engineering Codes Multicore processing parallel programming Communication systems Focusing Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Analysis of Different Algorithmic Design Techniques for Seam Carving

arXiv

引用

arXiv 2024年

作者： Ali, S. Muhammad Aijaz, Owais Uyghur, Yousuf Habib University Karachi Pakistan

Seam carving, a content-aware image resizing technique, has garnered significant attention for its ability to resize images while preserving important content. In this paper, we conduct a comprehensive analysis of four algorithmic design techniques for seam carving: brute-force, greedy, dynamic programming, and GPU-based parallel algorithms. We begin by presenting a theoretical overview of each technique, discussing their underlying principles and computational complexities. Subsequently, we delve into empirical evaluations, comparing the performance of these algorithms in terms of runtime efficiency. Our experimental results provide insights into the theoretical complexities of the design techniques. © 2024, CC BY.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：