检索结果-内蒙古大学图书馆

Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEvents

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2023年 179卷

作者： Torres, Yuri Andujar, Francisco J. Gonzalez-Escribano, Arturo Llanos, Diego R. Univ Valladolid Dept Informat Valladolid Spain

Heterogeneous systems with several kinds of devices, such as multi-core CPUs, GPUs, FPGAs, among others, are now commonplace. Exploiting all these devices with device-oriented programming models, such as CUDA or OpenCL, requires expertise and knowledge about the underlying hardware to tailor the application to each specific device, thus degrading performance portability. Higher-level proposals simplify the programming of these devices, but their current implementations do not have an efficient support to solve problems that include frequent bursts of computation and communication, or input/output operations. In this work we present CtrlEvents, a new heterogeneous runtime solution which automatically overlaps computation and communication whenever possible, simplifying and improving the efficiency of data-dependency analysis and the coordination of both device computations and host tasks that include generic I/O operations. Our solution outperforms other state-of-the-art implementations for most situations, presenting a good balance between portability, programmability and efficiency. (c) 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).

关键词： parallel programming Heterogeneous programming Asynchronous operations Events GPUs

来源：评论

学校读者我要写书评

暂无评论

parallelism exploration in sequential algorithms via animation tool

引用

MULTIAGENT AND GRID SYSTEMS 2021年第2期17卷 145-158页

作者： Qawasmeh, Ahmad Taamneh, Salah Aljammal, Ashraf H. Hamadneh, Nabhan Banikhalaf, Mustafa Kharabsheh, Mohammad Hashemite Univ Dept Comp Sci & Applicat Zarqa Jordan Yarmouk Univ Dept Comp Sci Irbid Jordan Hashemite Univ Dept Comp Informat Syst Zarqa Jordan

Different high performance techniques, such as profiling, tracing, and instrumentation, have been used to tune and enhance the performance of parallel applications. However, these techniques do not show how to explore the potential of parallelism in a given application. Animating and visualizing the execution process of a sequential algorithm provide a thorough understanding of its usage and functionality. In this work, an interactive web-based educational animation tool was developed to assist users in analyzing sequential algorithms to detect parallel regions regardless of the used parallel programming model. The tool simplifies algorithms' learning, and helps students to analyze programs efficiently. Our statistical t-test study on a sample of students showed a significant improvement in their perception of the mechanism and parallelism of applications and an increase in their willingness to learn algorithms and parallel programming.

关键词： Algorithm animation educational tool parallel programming performance analysis sorting algorithms algorithms visualization

来源：评论

学校读者我要写书评

暂无评论

Adding parallelism to sequential programs - a combined method

引用

INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS 2024年第1期70卷 135-144页

作者： Daszczuk, Wiktor B. Czejdo, Denny B. Grzeskowiak, Wojciech Warsaw Univ Technol Inst Comp Sci Warsaw Poland Fayetteville State Univ Dept Math & Comp Sci Fayetteville NC USA

The article outlines a contemporary method for creating software for multi -processor computers. It describes the identification of parallelizable sequential code structures. Three structures were found and then carefully examined. The algorithms used to determine whether or not certain parts of code may be parallelized result from static analysis. The techniques demonstrate how, if possible, existing sequential structures might be transformed into parallel -running programs. A dynamic evaluation is also a part of our process, and it can be used to assess the efficiency of the parallel programs that are developed. As a tool for sequential programs, the algorithms have been implemented in C#. All proposed methods were discussed using a common benchmark.

关键词： programming languages algorithms concurrency parallelism parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimizing Data Movement in Heterogeneous Computing: A LASSA-based Approach for Efficient Nucleation List Precomputation

Optimizing Data Movement in Heterogeneous Computing: A LASSA...

引用

International Symposium on Advanced Computing and Communication (ISACC)

作者： Biswajit Bhowmik Girish K K Himanshu Pandey Piyus Prabhanjans Dept. of Computer Science and Engineering BRICS Laboratory National Institute of Technology Karnataka Mangalore Bharat Dept. of Computer Science and Engineering Ishwarchandra Vidyasagar AIT Lab BRICS Laboratory National Institute of Technology Karnataka Mangalore Bharat Dept. of Computer Science and Engineering Maharshi Sushrut CAS Lab BRICS Laboratory National Institute of Technology Karnataka Mangalore Bharat

ISBN: (数字)9798331523893

ISBN: (纸本)9798331523909

In the rapidly evolving landscape of heterogeneous computing, the efficiency of data movement between CPUs and GPUs can make or break system performance. Despite advancements in parallel processing, existing methods for managing data transfers—particularly in GPU offloading scenarios—suffer from significant inefficiencies. These inefficiencies are particularly evident in nucleation list precomputation for non-equilibrium solidification models, where redundant data movements and complex dynamic work-sharing in OpenMP lead to significant performance overhead. To tackle this issue, this paper proposes a novel solution that integrates the Location-Aware Heap Static Single Assignment (LASSA) algorithm into the compilation process. This approach identifies and eliminates redundant memory copy operations, optimizing data transfers and reducing overhead. The findings reveal a dramatic performance boost, with up to a 9.6-fold increase in efficiency. By addressing the specific challenges of nucleation list precomputation, this work provides valuable insights into optimizing data movement in heterogeneous computing environments, paving the way for enhanced performance in parallel programming models.

关键词： Solid modeling Technological innovation parallel programming Computational modeling High performance computing System performance Graphics processing units parallel processing Heterogeneous networks Data models

来源：评论

学校读者我要写书评

暂无评论

An aggregator-based resource allocation in the smart grid using an artificial neural network and sliding time window optimization

引用

IET SMART GRID 2021年第6期4卷 612-622页

作者： Zheng, Yingying Celik, Berk Suryanarayanan, Siddharth Maciejewski, Anthony A. Siegel, Howard Jay Hansen, Timothy M. Utah State Univ Dept Biol Engn Logan UT 84322 USA CAPSIM Meyrargues France South Dakota State Univ Dept Elect Engn & Comp Sci Brookings SD 57007 USA Colorado State Univ Dept Elect & Comp Engn Ft Collins CO 80523 USA

The success of an efficient and effective aggregator-based residential demand response system in the smart grid relies on the day-ahead customer incentive pricing (CIP) and the load shifting protocols. An artificial neural network model is designed to generate the day-ahead CIP for the aggregator based on historical data. Load scheduling is proposed as a day-ahead optimization problem that is solved using a blocked sliding window technique using parallel computing. With the assumptions made, the proposed algorithm improved the aggregator performance by reducing the overall simulation time from 275 to 45 min and increasing the aggregator forecast profits and customer savings by 11.85% and 35.99% compared to the previous genetic algorithm-based approach.

关键词： optimisation neural nets smart power grids power engineering computing demand side management pricing resource allocation scheduling profitability parallel programming data handling electricity supply industry

来源：评论

学校读者我要写书评

暂无评论

Shared memory implementation and performance analysis of LSB steganography based on chaotic tent map

引用

INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING 2021年第4期17卷 333-342页

作者： Gambhir, Gaurav Mandal, Jyotsna Kumar Univ Kalyani Kalyani 741235 W Bengal India

Concealing secret information in an image so that any perceptible evidence of the image alteration is insignificant, is known as image steganography. Image steganography can be implemented with either spatial or transform domain techniques. Spatial domain-based algorithms, generally the most widely used ones, refer to the process of embedding the secret information in the least significant bit positions of the cover image pixels. This paper proposes a chaotic tent map-based bit embedding as a novel steganography algorithm with a multicore implementation. The potential reasons for using chaotic maps in image steganography are sensitivity of these functions to initial conditions and control parameters. The computational complexity of the sequential least significant bit algorithm is known to be O(n). Hence, time complexity of the encryption/decryption algorithm is also a very important aspect. With the advantages offered by multicore processors, the proposed steganography algorithm can now be explicitly parallelized using the OpenMP API. As a pre-embedding operation, the quality of the randomness of the chaotic number sequences is tested with a NIST cryptographic test suite. The quality of the stego image is validated with statistical parameters such as structural similarity index (SSIM), mean square error (MSE) and peak signal-to-noise ratio (PSNR). Moreover, exploiting data parallelism inherent in the algorithm, multicore implementation of the algorithm with OpenMP API has also been reported. Proposed parallel version of the technique has been tested on five test samples of images for scalability analysis and results indicate significant speed up as compared to the sequential implementation of the technique.

关键词： OpenMP Performance measurement Performance analysis Steganography Cryptography parallel programming

来源：评论

学校读者我要写书评

暂无评论

NPB-PSTL: C++ STL Algorithms with parallel Execution Policies in NAS parallel Benchmarks

NPB-PSTL: C++ STL Algorithms with Parallel Execution Policie...

引用

Euromicro Conference on parallel, Distributed and Network-Based Processing

作者： Júnior Löff Renato B. Hoffmann Arthur S. Bianchessi Leonardo Mallmann Dalvan Griebler Walter Binder Faculty of Informatics Università della Svizzera italiana (USI) Lugano Switzerland Pontifical Catholic University of Rio Grande do Sul (PUCRS) Porto Alegre Brazil

ISBN: (数字)9798331524937

ISBN: (纸本)9798331524944

The C++ language continually evolves through formal specifications established by its standards committee, proposing new features to maintain $\mathrm{C}++$ as a relevant programming language while improving usability, performance, and portability across platforms. With the addition of parallel Standard Template Library (STL) algorithms in C++17, programmers can now leverage parallel processing capabilities via vendor-neutral parallel execution policies. This study presents an adaptation of the NAS parallel Benchmarks (NPB)—a well-established suite of applications for evaluating parallel architectures-by porting its sequential C-style code to use C++ STL abstractions and performance-portable parallelism features. Our goals are to (1) assess the suitability of C++ STL for scientific applications like the ones in the NPB and (2) provide a comparative performance and portability of STL algorithms’ parallel execution policies across different multicore architectures (x86 and AArch64). Results indicate that the performance of parallel STL algorithms is often close to that of optimized handwritten versions (OpenMP, Intel TBB, and FastFlow) on different architectures, with notable shortfalls. Across all NPB benchmarks, the STL algorithms’ geometric mean shows sequential execution times that are between 3.76% and $\mathrm{6. 9 \%}$ higher, while parallel executions may reach a geometric mean of up to $\mathrm{2 1. 2 1 \%}$ higher execution time.

关键词： parallel programming Multicore processing Software algorithms C++ languages Benchmark testing parallel processing Data structures Libraries Hardware Standards

来源：评论

学校读者我要写书评

暂无评论

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra

arXiv

引用

arXiv 2025年

作者： Bellavita, Julian Pasquali, Thomas Rio Martin, Laura Del Vella, Flavio Guidi, Giulia Cornell University IthacaNY United States University of Trento Trento Italy

K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable clusters, which may lead to inaccurate solutions in certain cases. Kernel K-means is a variant of classical K-means that can find non-linearly separable clusters. However, it scales quadratically with respect to the size of the dataset, taking several minutes to cluster even medium-sized datasets on traditional CPU-based machines. In this paper, we present a formulation of Kernel K-means using sparse-dense matrix multiplication (SpMM) and sparse matrix-vector multiplication (SpMV), and we show that our formulation enables the rapid implementation of a fast GPU-based version of Kernel K-means with little programming effort. Our implementation, named Popcorn, is the first open-source GPU-based implementation of Kernel K-means. Popcorn achieves a speedup of up to 123.8× over a CPU implementation of Kernel K-means and a speedup of up to 2.6× over a GPU implementation of Kernel K-means that does not use sparse matrix computations. Our results support the effectiveness of sparse matrices as tools for efficient parallel programming. © 2025, CC BY.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Application of CPU parallel Computing in MATLAB Environment for Analysis of Anti-Saturation and Gain Effects in PID Controller

Application of CPU Parallel Computing in MATLAB Environment ...

引用

Data Driven Control and Learning Systems (DDCLS)

作者： Ruirui Huang Zhenhui Wu Yandong Hou Qianshuai Cheng Henan Key Laboratory of Big Data Analysis and Processing Henan University Kaifeng China Technology and Media University of Henan Kaifeng Kaifeng China School of Artificial Intelligence Henan University Zhengzhou China

ISBN: (数字)9798350361674

ISBN: (纸本)9798350361681

With the continuous increase in data size and model complexity, the computational workload has grown rapidly, posing a significant challenge to the capabilities of computer data processing and simulation calculations. Therefore, parallel programming based on multicore and cluster architectures has become one of the mainstream technologies to enhance program execution efficiency and numerical computation efficiency. The theoretical foundations of parallel program computing have been applied in various aspects of engineering applications and theoretical simulations. In this paper, a new parallel PID anti-integral saturation controller is designed for a second-order closed-loop control system of unmanned aerial vehicles (UAVs). It compares and analyzes the runtime, execution efficiency, and speedup ratio of parallel programs and general serial programs under the same scenario. The experimental results demonstrate that parallel computing significantly improves the simulation program's efficiency for PID controller anti-saturation control systems under identical scenarios, exhibiting a high speedup ratio on the existing computing platform. Additionally, this study consolidates common issues encountered in MATLAB parallel program design, offering valuable insights into overcoming challenges in this domain.

关键词： Runtime parallel programming Multicore processing Process control parallel processing Control systems Central Processing Unit

来源：评论

学校读者我要写书评

暂无评论

Accelerating Fortran Codes: A Method for Integrating Coarray Fortran with Cuda Fortran and Openmp

SSRN

引用

SSRN 2024年

作者： McKevitt, James Vorobyov, Eduard I. Kulikov, Igor University of Vienna Department of Astrophysics Türkenschanzstrasse 17 ViennaA-1180 Austria University College London Mullard Space Science Laboratory Holmbury St Mary Dorking SurreyRH5 6NT United Kingdom Institute of Computational Mathematics and Mathematical Geophysics SB RAS Lavrentieva ave. 6 Novosibirsk630090 Russia

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), an extension of Fortran introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax. © 2024, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：