检索结果-内蒙古大学图书馆

International Conference on Computer Engineering and Applications (ICCEA)

作者： Yuehua Chen Huaqiang Yuan Fengyao Hou Peng Hu Dongguan University of Technology Dongguan China Institute of High Energy Physics Chinese Academy of Sciences Beijing China Spallation Neutron Source Science Center Dongguan China

ISBN: (数字)9798350386776

ISBN: (纸本)9798350386783

The popularity of multicore processors and the rise of High Performance Computing as a Service (HPCaaS) have made parallel programming essential to fully utilize the performance of multicore systems. OpenMP, a widely adopted shared-memory parallel programming model, is favored for its ease of use. However, it is still challenging to assist and accelerate automation of its parallelization. Although existing automation tools such as Cetus and DiscoPoP to simplify the parallelization, there are still limitations when dealing with complex data dependencies and control flows. Inspired by the success of deep learning in the field of Natural Language Processing (NLP), this study adopts a Transformer-based model to tackle the problems of automatic parallelization of OpenMP instructions. We propose a novel Transformer-based multimodal model, ParaMP, to improve the accuracy of OpenMP instruction classification. The ParaMP model not only takes into account the sequential features of the code text, but also incorporates the code structural features and enriches the input features of the model by representing the Abstract Syntax Trees (ASTs) corresponding to the codes in the form of binary trees. In addition, we built a BTCode dataset, which contains a large number of C/C++ code snippets and their corresponding simplified AST representations, to provide a basis for model training. Experimental evaluation shows that our model outperforms other existing automated tools and models in key performance metrics such as F1 score and recall. This study shows a significant improvement on the accuracy of OpenMP instruction classification by combining sequential and structural features of code text, which will provide a valuable insight into deep learning techniques to programming tasks.

关键词： Deep learning Training Codes Automation Accuracy parallel programming Multicore processing

来源：评论

学校读者我要写书评

暂无评论

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

ParaGraph: Weighted Graph Representation for Performance Opt...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Ali TehraniJamsaz Alok Mishra Akash Dutta Abid M. Malik Barbara Chapman Ali Jannesari Iowa State University Ames Iowa USA Hewlett Packard Enterprise Milpitas California USA Stony Brook University Stony Brook New York USA

ISBN: (数字)9798350364606

ISBN: (纸本)9798350364613

GPU-based HPC clusters are attracting more sci-entific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an application developer is to utilize directive-based parallel programming models, such as OpenMP. However, even with OpenMP, the developer must choose from among many strategies for exploiting a GPU or a CPU. This paper introduces a new graph-based program representation for optimization of OpenMP applications. The originality of this work lies in the augmentations of Abstract Syntax Trees (ASTs) and the introduction of edge weights to account for loop and condition information. We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of OpenMP code regions across CPUs and GPUs. Various transformations utilizing collapse and data transfer between the CPU and GPU are used to construct the dataset. The trained model is used to determine which transformation provides the best performance. Results indicate that our approach is effective and has normalized RMSE as low as $4\times 10^{-3}$ to at most $1\times 10^{-2}$ in its runtime predictions.

关键词： Training Runtime parallel programming Graphics processing units Syntactics parallel processing Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches

Comparative Analysis of Executing GPU Applications on FPGA: ...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Chihyo Ahn Shinnung Jeong Liam Paul Cooper Nicholas Parnenzini Hyesoon Kim Georgia Institute of Technology Atlanta USA Yonsei University Seoul Republic of Korea

ISBN: (数字)9798350364606

ISBN: (纸本)9798350364613

With the development of the GPU, parallel languages are widely used for developing modern parallel applications. Given its low energy cost and programmable hardware, the FPGA emerges as a promising candidate to run GPU applications. Therefore, executing applications described in GPU programming languages on FPGA can offer new opportunities in terms of performance and energy efficiency. However, the gap between GPU programming languages and hardware description languages (HDL) poses a significant challenge for this transition. To overcome this problem, existing works have attempted to bridge this gap through high-level synthesis (HLS) or soft GPU. In this paper, we examine how HLS and soft GPU compile GPU languages for FPGA by discussing the detailed compilation and execution flow of two representative works: Intel FPGA SDK for OpenCL and Vortex. This paper also evaluates the coverage of both approaches and discusses methods for addressing the challenges each approach faces. Consequently, this paper explores the challenges HLS and GPU encounter, aiming to identify new problems and opportunities each approach introduces.

关键词： parallel languages parallel programming Pipelines Graphics processing units Hardware User experience Kernel

来源：评论

学校读者我要写书评

暂无评论

PNCS: A Privately Non-Cacheable Strategy for Synchronization in Multi-Core Systems

PNCS: A Privately Non-Cacheable Strategy for Synchronization...

引用

Electronic Information Engineering and Computer Science (EIECS), 2021 International Conference on

作者： Tongtong Guo Yu Zhou Anzhou Lai Liang Yang Jian Shao The 58th Research Institute of China Electronics Technology Group Corporation Wuxi China CETC Suntai Information Technology Co. Ltd. Wuxi China

ISBN: (数字)9798331531409

ISBN: (纸本)9798331531416

Not all data sharing patterns can benefit from the write invalidate strategy in multi-core systems. When handling serialized synchronizations, such as lock, barrier etc., long delays and large amount of cache coherence traffic caused by invalidations introduce performance bottlenecks. This paper introduces a privately non-cacheable strategy (PNCS), which favors the maintenance of shared data exhibiting write-once characteristic, like locks, used in synchronizations among threads by not caching the corresponding data block in the private caches but the shared last-level cache. While cooperating with traditional cache coherence protocol MESI, PNCS implements data forwarding of a shared lock to the next sharer waiting in the request queue at the last-level cache (LLC) without incurring another round of LLC lookup. Simulation results reported that PNCS can accelerate acquisition of the variable by the requesters while cutting down the invalidation traffic during a synchronization phase. By creating applications that involve large-scale thread synchronization under parallel programming directives, results demonstrated that PNCS scales within multi-core systems. In the scenario of 64-thread lock synchronization, the average latency of contending requests can be reduced to about 73% of that in “cache lock”, which is a strict write invalidate strategy proposed by Intel.

关键词： Protocols Costs parallel programming Chiplets Simulation Coherence Computer architecture Delays Maintenance Proposals

来源：评论

学校读者我要写书评

暂无评论

The parallel Semantics Program Dependence Graph

arXiv

引用

arXiv 2024年

作者： Homerding, Brian Patel, Atmn Deiana, Enrico Armenio Su, Yian Tan, Zujun Xu, Ziyang Godala, Bhargav Reddy August, David I. Campanoni, Simone Northwestern University United States Princeton University United States

A compiler’s intermediate representation (IR) defines a program’s execution plan by encoding its instructions and their relative order. Compiler optimizations aim to replace a given execution plan (which instructions to execute and when) with a semantically-equivalent one that increases the program’s performance for the target architecture. Alternative representations of an IR, like the Program Dependence Graph (PDG), aid this process by capturing the minimum set of constraints that semantically-equivalent execution plans must satisfy. parallel programming like OpenMP extends a sequential execution plan by adding the possibility of running instructions in parallel, creating a parallel execution plan. Recently introduced parallel IRs, like TAPIR, explicitly encode a parallel execution plan. These new IRs finally make it possible for compilers to change the parallel execution plan expressed by programmers to better fit the target parallel architecture. Unfortunately, parallel IRs do not help compilers in identifying the set of parallel execution plans that preserve the original semantics. In other words, we are still lacking an alternative representation of parallel IRs to capture the minimum set of constraints that parallel execution plans must satisfy to be semantically-equivalent. Unfortunately, the PDG is not an ideal candidate for this task as it was designed for sequential code. In more detail, this paper shows that the PDG over-constrains the optimization space when used for parallel code. We propose the parallel Semantics Program Dependence Graph (PS-PDG) to precisely capture the salient program constraints that all semantically-equivalent parallel execution plans (and therefore parallel IRs) must satisfy. This paper defines the PS-PDG, justifies the necessity of each extension to the PDG, and demonstrates the increased optimization power of the PS-PDG over an existing PDG-based automatic-parallelizing compiler. Compilers can now rely on the PS-PDG to select d

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallelizing Accelerographic Records Processing

Parallelizing Accelerographic Records Processing

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Ronaldo Canizales Luis Mixco Jedidiah McClurg Department of Computer Science Colorado State University Colorado USA Observatorio de Amenazas y Recursos Naturales Ministerio de Medio Ambiente y Recursos Naturales San Salvador El Salvador

ISBN: (数字)9798350364606

ISBN: (纸本)9798350364613

Strong-motion processing holds paramount importance in earthquake engineering and disaster risk management systems. By leveraging parallel loops and task-parallelism techniques, we address computational challenges posed by large-scale accelerographic datasets. Through experimentation with more than one million data points from six real-world seismic events, our approach achieved speedups of up to 2.9x, demonstrating the effectiveness of parallel programming in accelerating seismic data processing. Our findings highlight the significance of parallel programming techniques in advancing seismological research and enhancing earthquake mitigation strategies.

关键词： Distributed processing parallel programming Disasters Scalability Prevention and mitigation Earthquake engineering Seismology

来源：评论

学校读者我要写书评

暂无评论

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

TaPS: A Performance Evaluation Suite for Task-based Executio...

引用

IEEE International Conference on e-Science and Grid Computing

作者： J. Gregory Pauloski Valerie Hayot-Sasson Maxime Gonthier Nathaniel Hudson Haochen Pan Sicheng Zhou Ian Foster Kyle Chard Department of Computer Science University of Chicago Chicago IL USA Data Science and Learning Division Argonne National Laboratory Lemont IL USA

ISBN: (数字)9798350365610

ISBN: (纸本)9798350365627

Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal and abstract the parallel and distributed execution of those tasks on arbitrary hardware. Research into these task executors has accelerated as computational sciences increasingly need to take advantage of parallel compute and/or heterogeneous hardware. However, the lack of evaluation standards makes it challenging to compare and contrast novel systems against existing implementations. Here, we introduce TaPS, the Task Performance Suite, to support continued research in distributed task executor frameworks. TaPS provides (1) a unified, modular interface for writing and evaluating applications using arbitrary execution frameworks and data management systems and (2) an initial set of reference synthetic and real-world science applications. We discuss how the design of TaPS supports the reliable evaluation of frameworks and demonstrate TaPS through a survey of benchmarks using the provided reference applications.

关键词： Surveys Performance evaluation parallel programming Scalability Benchmark testing Writing Metadata

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of LiDAR Data Processing on Multi-Core CPU and GPU Architectures

Performance Analysis of LiDAR Data Processing on Multi-Core ...

引用

Computing and Machine Intelligence (ICMI), International Conference on

作者： Mohammad S. Alzyout Abd Alrahman AL Nounou Yashwanth Naidu Tikkisetty Shadi Alawneh Electrical and Computer Engineering Department Oakland University Rochester MI USA

ISBN: (数字)9798350372977

ISBN: (纸本)9798350372984

The projection process of the LiDAR 3D Point Cloud data is one of the crucial steps in Computer vision applications. It involves several steps to achieve the finalized accurate results. Many current studies leverage the benefits of using a GPU in the computing capability. This paper presents a comparative study of the implementation and testing of this process on single-core, multi-core CPU and GPU architectures. The computational efficiency of each platform is evaluated through a series of benchmarks, including data extraction, segmentation, and trans-formation tasks. Our analysis reveals the inherent parallelization benefits of GPUs in handling large-scale point cloud data, while also considering the accessibility of multi-core CPUs. Also, a comparison between the NVIDIA RTX 3070 and NVIDIA RTX 4060 is provided. The RTX 3070 showed roughly a speed up of 8 times over the RTX 4060. In addition, the Multi-core implementation outperforms up to 10 times over the single-core. These results overall show the benefits of using the Multi-core and the GPU accelerating approaches to this application, with the availability for further improvements.

关键词： Point cloud compression Laser radar Three-dimensional displays parallel programming Graphics processing units Computer architecture Performance analysis

来源：评论

学校读者我要写书评

暂无评论

AOmpLib: An Aspect Library for Large-Scale Multi-Core parallel programming

AOmpLib: An Aspect Library for Large-Scale Multi-Core Parall...

引用

42nd Annual International Conference on parallel Processing (ICPP)

作者： Medeiros, Bruno Sobral, Joao L. Univ Minho Dept Informat CCTC Braga Portugal

ISBN: (纸本)9780769551173

This paper introduces an aspect-oriented library aimed to support efficient execution of Java applications on multi-core systems. The library is coded in AspectJ and provides a set of parallel programming abstractions that mimics the OpenMP standard. The library supports the migration of sequential Java codes to multi-core machines with minor changes to the base code, intrinsically supports the sequential semantics of OpenMP and provides improved integration with object-oriented mechanisms. The aspect-oriented nature of library enables the encapsulation of parallelism-related code into well-defined modules. The approach makes the parallelisation and the maintenance of large-scale Java applications more manageable. Furthermore, the library can be used with plain Java annotations and can be easily extended with application-specific mechanisms in order to tune application performance. The library has a competitive performance, in comparison with traditional parallel programming in Java, and enhances programmability, since it allows an independent development of parallelism-related code.

关键词： Java Aspect-oriented programming parallel programming OpenMP

来源：评论

学校读者我要写书评

暂无评论

Development of a Library for Image Processing Using Openmpi and Openmp

Development of a Library for Image Processing Using Openmpi ...

引用

Electronics and Sustainable Communication Systems (ICESC), 2020 International Conference on

作者： Robinson Oliva-Salazar Wilver Auccahuasi Universidad Cientifica del Sur Lima Perú

ISBN: (数字)9798350379945

ISBN: (纸本)9798350379952

Nowadays, in the different areas of knowledge, there is an increase in the amount of information needed to process, reason why many solutions have been generated for the implementation of high-performance computing, these available solutions depend on many factors, from the use of available different architectures. This research work presents a method for the configuration of a low-cost solution for the implementation of asolution based on HPC, using the OpenMP and OpenMPI libraries. The processes necessary for the implementation of programs to exploit these two libraries that are used in the application of parallel programming are described. As a result, the study presents the application of the methodology using file compression, which was implemented Huffman's algorithm, the results demonstrate the optimization in parallel work working with OpenMP and OpenMPI libraries, which allows working with all processors available in the different computer architectures that are available. The study indicates the mode of use and application of the methodology described.

关键词： Image coding parallel programming Operating systems Linux High performance computing Graphics processing units Computer architecture Libraries Satellite images Optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：