parallel programming is often regarded as one of the hardest programming disciplines. On the one hand, parallel programs are notoriously prone to concurrency errors;and, while trying to avoid such errors, achieving pr...
详细信息
ISBN:
(纸本)9783642400476
parallel programming is often regarded as one of the hardest programming disciplines. On the one hand, parallel programs are notoriously prone to concurrency errors;and, while trying to avoid such errors, achieving program performance becomes a significant challenge. As a result of the multicore revolution, parallel programming has however ceased to be a task for domain experts only. And for this reason, a large variety of languages and libraries have been proposed that promise to ease this task. This paper presents a study to investigate whether such approaches succeed in closing the gap between domain experts and mainstream developers. Four approaches are studied: Chapel, Cilk, Go, and Threading Building Blocks (TBB). Each approach is used to implement a suite of benchmark programs, which are then reviewed by notable experts in the language. By comparing original and revised versions with respect to source code size, coding time, execution time, and speedup, we gain insights into the importance of expert knowledge when using modern parallel programming approaches.
This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific code...
详细信息
ISBN:
(数字)9798350383454
ISBN:
(纸本)9798350383461
This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework called LASSI, designed to translate between parallel programming languages by bootstrapping existing closed- or open-source LLMs. LASSI incorporates autonomous enhancement through self-correcting loops where errors encountered during the compilation and execution of generated code are fed back to the LLM through guided prompting for debugging and refactoring. We highlight the bidirectional translation of existing GPU benchmarks between OpenMP target offload and CUDA to validate LASSI. The results of evaluating LASSI with different application codes across four LLMs demonstrate the effectiveness of LASSI for generating executable parallel codes, with 80% of OpenMP to CUDA translations and 85% of CUDA to OpenMP translations producing the expected output. We also observe approximately 78% of OpenMP to CUDA translations and 62% of CUDA to OpenMP translations execute within 10% of or at a faster runtime than the original benchmark code in the same language.
This sequence alignment stands as a pivotal method in the realm of bioinformatics, meticulously employed to ascertain the degree of similarity between diverse sequences such as DNA, RNA, and amino acids. Among the myr...
详细信息
ISBN:
(数字)9798350383027
ISBN:
(纸本)9798350383034
This sequence alignment stands as a pivotal method in the realm of bioinformatics, meticulously employed to ascertain the degree of similarity between diverse sequences such as DNA, RNA, and amino acids. Among the myriad techniques utilized in tackling sequence alignment challenges, the Longest Common Subsequence (LCS) takes center stage. This paper delves into the realm of enhancing LCS efficiency through the implementation of thread parallelization. Drawing inspiration from the seminal work of Wagner and Fischer in 1974, both sequential and parallel techniques exhibit remarkable consistency in identifying the maximum length of LCS. However, this research goes a step further by introducing thread parallelization, which leverages multithreading, resource synchronization, and task decomposition within the domain of parallel programming. The meticulous integration of these advanced techniques results in a notable enhancement in terms of running time compared to the conventional iterative sequential approach. The experimentation and evaluation of both sequential and parallel approaches were conducted using Netbeans, a robust Integrated Development Environment (IDE) tailored for the Java programming Language. The findings underscore the superior performance of the thread parallelization strategy, establishing its prowess in optimizing the execution time of LCS problem resolution.
Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multit...
详细信息
The Touch programming language for swarm intelligent building application (APP) development effectively reduces the development difficulty and user programming threshold, making the building more intelligent. However,...
详细信息
ISBN:
(数字)9798350387780
ISBN:
(纸本)9798350387797
The Touch programming language for swarm intelligent building application (APP) development effectively reduces the development difficulty and user programming threshold, making the building more intelligent. However, the features of Touch language such as intuitive modeling of building elements, parallel programming, and the implicit specification of internode communication lead to great challenges in the compilation process of Touch language to the low-level executable object code of swarm intelligent buildings, and the APP development efficiency is not high. This paper proposes a code conversion method from Touch to C language and its supporting tools, designs code conversion algorithms for Touch language elements used to describe distributed building physical objects and parallel computing mode, which supports the automatic conversion of high-level Touch language, which is user-oriented and shielded from the details of the underlying interactions, into the C language code for underlying execution, thus realizing an integrated process from high-level APP development to low-level hardware platform execution and improving the APP development efficiency.
This paper focuses on developing algorithms for parallel determinant processing, a crucial task in linear algebra and computational mathematics. The aim is to improve efficiency in high-performance computing environme...
详细信息
ISBN:
(数字)9798350387568
ISBN:
(纸本)9798350387575
This paper focuses on developing algorithms for parallel determinant processing, a crucial task in linear algebra and computational mathematics. The aim is to improve efficiency in high-performance computing environments by designing and analyzing algorithms that use parallel processing to expedite determinant computation for various matrices range. The research explores methods like Laplace expansion, LU decomposition, eigenvalue decomposition, Gaussian elimination, and cofactor expansion, assessing their efficiency, scalability, and applicability in different computational environments. The study employs advanced parallel programming techniques and architectures, utilizing multi-core processors with the focus aim into utilization of Chio’s method of rectangular determinants processing in parallel etc. The research also investigates the mathematical underpinnings of parallel determinant algorithms, addressing challenges like load balancing, data distribution, and synchronization. The results show significant improvements in determinant calculations efficiency, reducing computation times for large matrices.
One-sided communication is one of many approaches to use for data transfer in High-Performance Computing (HPC) applications. One-sided operations require less demand on parallel programming libraries and do not requir...
详细信息
ISBN:
(数字)9798331509095
ISBN:
(纸本)9798331509101
One-sided communication is one of many approaches to use for data transfer in High-Performance Computing (HPC) applications. One-sided operations require less demand on parallel programming libraries and do not require HPC hardware to issue acknowledgments of successful data transfer. Thanks to its inherently non-blocking nature, one-sided communication is also useful for improving overlap between communication and compute. As with any non-blocking communication, however, we run into the issue of message progression getting interleaved with computation. With the advent of Smart Network Cards (SmartNIC) such as NVIDIA's BlueField Data Processing Units (DPU), we can offload the communication and message progression to these devices to improve the overlap of communication and compute. In this paper, we propose designs for efficient offloading of one-sided communication. We show how our designs can be used for offloading both MPI one-sided “put” and “get” and OpenSHMEM's non-blocking “put” and “get”. Using a Block Sparse Matrix-Multiplication Kernel (BSPMM), we show that our designs achieve over 96% improvement in runtime over pure-host execution for communication offload. We also briefly explore initial compute offload ideas for such one-sided kernels and show over 91% improvement in runtime here.
Remote Memory Access (RMA) enables direct access to remote memory to achieve high performance for HPC applications. However, most modern parallel programming models lack schemes for the remote process to detect the co...
详细信息
Today system and application programming is moving toward concurrent and parallel programming with the development of multicore and multiprogramming architectures. In an effort to improve study performance, researcher...
详细信息
ISBN:
(数字)9798350305463
ISBN:
(纸本)9798350305470
Today system and application programming is moving toward concurrent and parallel programming with the development of multicore and multiprogramming architectures. In an effort to improve study performance, researchers are looking for more efficient methods to include multiprocessing and multicore programming into their simulation systems. This article provides an overview of multicore programming and illustrates how it can be implemented. The paper also focusing the limitations of primitive data types for diverse applications, especially in the context of computer systems. The article delves into the necessity of big numbers and arithmetic on a significant scale. Focusing on C programming, the article showcases the implementation of big numbers, providing scholars with a comprehensive understanding of the concept and its practical realization.
Seam carving, a content-aware image resizing technique, has garnered significant attention for its ability to resize images while preserving important content. In this paper, we conduct a comprehensive analysis of fou...
详细信息
暂无评论