检索结果-内蒙古大学图书馆

A survey on optimizations towards best-effort hardware transactional memory

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING 2020年第4期2卷 401-414页

作者： Wu, Zhenwei Lu, Kai Wang, Ruibo Zhang, Wenzhe Natl Univ Def Technol Coll Comp Changsha 410073 Hunan Peoples R China

Transactional memory has been attracting increasing attention in recent years, and it provides optimistic concurrency control schemes for shared-memory parallel programs. The rapid development and wide adoption of transactional memory make this programming paradigm promising for achieving breakthroughs in massively parallel computing. There has been a large number of discussions towards transactional memory systems, which aimed at providing relatively simple and intuitive synchronization construction for shared-memory parallel programs without sacrificing performance. Hardware transactional memory (HTM) has become commercially available in mainstream processors, however, due to several inherent architectural limitations that will abort hardware transactions, such as cache overflows, context switches, hardware as well as software exceptions, etc., nowadays HTM systems come in a best-effort way, which necessitates the adoption of a software fallback path to ensure forward progress. In this paper, we survey state-of-the-art software-side optimizations for best-effort hardware transaction system, as well as several novel performance tuning techniques. Research efforts about joint usage of HTM and non-volatile memory (NVM) are also discussed.

关键词： Transactional memory parallel programming Concurrency control

来源：评论

学校读者我要写书评

暂无评论

parallel computing learning in electronic engineering based in the treatment of images

Parallel computing learning in electronic engineering based ...

引用

Technologies Applied to Electronics Teaching (TAEE)

作者： Miguel Rodrigo Alejandro Liberos Germá n Ramos Joaquí n Cerdá Dep. d&#x0027 Enginyeria Electr&#x00F2 nica Universitat de Val&#x00E8 ncia Val&#x00E8 ncia Spain nica Universitat Polit&#x00E8 cnica de Val&#x00E8

ISBN: (数字)9781665421614

ISBN: (纸本)9781665421621

This work presents the main activities and results of a parallel programming learning process for Graphics Processing Units (GPU) language CUDA based on algorithms for processing and generating 2D and 3D images. The proposed learning activities focus on the key points of parallel programming, such as the optimal use of the different types of memory. The learning process has been proposed as a set of master classes on image theory and parallel programming, along with practical CUDA programming sessions for 2D and 3D image generation and processing. Results show the student satisfaction for the proposed learning process and similar marks than before its application.

关键词： Training Three-dimensional displays Image coding parallel programming Image synthesis Memory management Graphics processing units

来源：评论

学校读者我要写书评

暂无评论

parallelism detection using graph labelling

arXiv

引用

arXiv 2022年

作者： Telegin, Pavel N. Baranov, Anton V. Shabanov, Boris M. Tikhomirov, Artem I. Joint Supercomputer Center The Russian Academy of Sciences Branch of Federal State Institution Scientic Research Institute for System Analysis The Russian Academy of Sciences Leninsky prospect 32a Moscow119334 Russia

Usage of multiprocessor and multicore computers implies parallel programming. Tools for preparing parallel programs include parallel languages and libraries as well as parallelizing compilers and convertors that can perform automatic parallelization. The basic approach for parallelism detection is analysis of data dependencies and properties of program components, including data use and predicates. In this article a suite of used data and predicates sets for program components is proposed and an algorithm for computing these sets is suggested. The algorithm is based on wave propagation on graphs with cycles and labelling. This method allows analyzing complex program components, improving data localization and thus providing enhanced data parallelism *** Codes 68W10 Copyright © 2022, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A PCB Alignment System Using RST Template Matching with CUDA on Embedded GPU Board

引用

SENSORS 2020年第9期20卷 2736-2736页

作者： Le, Minh-Tri Tu, Ching-Ting Guo, Shu-Mei Lien, Jenn-Jier James Natl Cheng Kung Univ Dept Comp Sci & Informat Engn 1 Univ Rd Tainan 701 Taiwan Natl Chung Hsing Univ Dept Appl Math 145 Xingda Rd Taichung 402 Taiwan

The fiducial-marks-based alignment process is one of the most critical steps in printed circuit board (PCB) manufacturing. In the alignment process, a machine vision technique is used to detect the fiducial marks and then adjust the position of the vision system in such a way that it is aligned with the PCB. The present study proposed an embedded PCB alignment system, in which a rotation, scale and translation (RST) template-matching algorithm was employed to locate the marks on the PCB surface. The coordinates and angles of the detected marks were then compared with the reference values which were set by users, and the difference between them was used to adjust the position of the vision system accordingly. To improve the positioning accuracy, the angle and location matching process was performed in refinement processes. To overcome the matching time, in the present study we accelerated the rotation matching by eliminating the weak features in the scanning process and converting the normalized cross correlation (NCC) formula to a sum of products. Moreover, the scanning time was reduced by implementing the entire RST process in parallel on threads of a graphics processing unit (GPU) by applying hash functions to find refined positions in the refinement matching process. The experimental results showed that the resulting matching time was around 32x faster than that achieved on a conventional central processing unit (CPU) for a test image size of 1280 x 960 pixels. Furthermore, the precision of the alignment process achieved a considerable result with a tolerance of 36.4 mu m.

关键词： alignment system PCB manufacturing template matching embedded system GPU parallel programming

来源：评论

学校读者我要写书评

暂无评论

Teaching parallelism With Gamification in Cellular Automaton Environments

引用

IEEE REVISTA IBEROAMERICANA DE TECNOLOGIAS DEL APRENDIZAJE-IEEE RITA 2020年第1期15卷 34-42页

作者： Hardasmal, Antonio J. Tomeu Salguero, Alberto G. Univ Cadiz Dept Comp Sci Cadiz 11003 Spain

parallel programming within the computer science degree is now mandatory. New hardware platforms, with multiple cores and the execution of concurrent threads, require it. Despite the above, the teaching of parallelism with the usual methods and classical algorithms, make this topic hard for our students to understand. On the other hand, teaching complex topics through the techniques of gamification has already demonstrated, in a reliable way, a positive reinforcement of the student in front of the learning of complex concepts. In this work we demonstrate a way to convey the teaching of parallelism to undergraduate students using gamification in microworlds. The results obtained by the students who followed this model, compared to a control group that followed the standard model, show a statistically significant advantage in favor of the teaching of parallelism, using a gamification with microworlds model.

关键词： Cellular automaton E-learning gamification microworlds multicore programming parallel programming

来源：评论

学校读者我要写书评

暂无评论

Scalable Automatic Differentiation of Multiple parallel Paradigms through Compiler Augmentation

Scalable Automatic Differentiation of Multiple Parallel Para...

引用

Supercomputing Conference

作者： William S. Moses Sri Hari Krishna Narayanan Ludger Paehler Valentin Churavy Michel Schanen Jan Hückelheim Johannes Doerfert Paul Hovland MIT CSAIL Cambridge MA Argonne National Laboratory Lemont IL Technical University of Munich Munich Germany

ISBN: (纸本)9781665454452

Derivatives are key to numerous science, engineering, and machine learning applications. While existing tools generate derivatives of programs in a single language, modern parallel applications combine a set of frameworks and languages to leverage available performance and function in an evolving hardware landscape. We propose a scheme for differentiating arbitrary DAG-based parallelism that preserves scalability and efficiency, implemented into the LLVM-based Enzyme automatic differentiation framework. By integrating with a full-fledged compiler backend, Enzyme can differentiate numerous parallel frameworks and directly control code generation. Combined with its ability to differentiate any LLVM-based language, this flexibility permits Enzyme to leverage the compiler tool chain for parallel and differentiation-specitic optimizations. We differentiate nine distinct versions of the LULESH and miniBUDE applications, written in different programming languages (C++, Julia) and parallel frameworks (OpenMP, MPI, RAJA, Julia tasks, ***), demonstrating similar scalability to the original program. On benchmarks with 64 threads or nodes, we find a differentiation overhead of 3.4–6.8× on C++ and 5.4–12.5× on Julia.

关键词： Enzymes Codes Program processors Runtime parallel programming Scalability C++ languages

来源：评论

学校读者我要写书评

暂无评论

High-Performance Computing for Super-Resolution Microscopy on A Cluster of Computers

arXiv

引用

arXiv 2022年

作者： Do, Quan Kristiansen, Jon Ivar Agarwal, Krishna Ha, Phuong Hoai Department of Computer Science UiT The Arctic University of Norway Tromsø9037 Norway Department of Physics and Technology UiT The Arctic University of Norway Tromsø9037 Norway

Multiple signal classification algorithm (MUSICAL) provides a super-resolution microscopy method. In the previous research, MUSICAL has enabled data-parallelism well on a desktop computer or a Linux-based server. However, the running time needs to be shorter. This paper will develop a new parallel MUSICAL with high efficiency and scalability on a cluster of computers. We achieve the purpose by using the optimal speed of the cluster cores, the latest parallel programming techniques, and the high-performance computing libraries, such as the Intel Threading Building Blocks (TBB), the Intel Math Kernel Library (MKL), and the unified parallel C++ (UPC++) for the cluster of computers. Our experimental results show that the new parallel MUSICAL achieves a speed-up of 240.29x within 10 seconds on the 256-core cluster with an efficiency of 93.86%. Our MUSICAL offers a high possibility for real-life applications to make super-resolution microscopy within seconds. Copyright © 2022, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Quantum Computer Architecture Toward Full-Stack Quantum Accelerators

IEEE TRANSACTIONS ON QUANTUM ENGINEERING

引用

IEEE TRANSACTIONS ON QUANTUM ENGINEERING 2020年 1卷

作者： Bertels, Koen Sarkar, A. Hubregtsen, T. Serrao, M. Mouedenne, A. A. Yadav, A. Krol, A. Ashraf, I. Almudever, C. Garcia Delft Univ Technol Quantum Comp Architecture Lab NL-2628 CD Delft Netherlands

This article presents the definition and implementation of a quantum computer architecture to enable creating a new computational device-a quantum computer as an accelerator. A key question addressed is what such a quantum computer is and how it relates to the classical processor that controls the entire execution process. In this article, we present explicitly the idea of a quantum accelerator that contains the full stack of the layers of an accelerator. Such a stack starts at the highest level describing the target application of the accelerator. The next layer abstracts the quantum logic outlining the algorithm that is to be executed on the quantum accelerator. In our case, the logic is expressed in the universal quantum-classical hybrid computation language developed in the group, called OpenQL, which visualized the quantum processor as a computational accelerator. The OpenQL compiler translates the program to a common assembly language, called cQASM, which can be executed on a quantum simulator. The cQASM represents the instruction set that can be executed by the microarchitecture implemented in the quantum accelerator. In a subsequent step, the compiler can convert the cQASM to generate the eQASM, which is executable on a particular experimental device incorporating the platform-specific parameters. This way, we are able to distinguish clearly the experimental research toward better qubits, and the industrial and societal applications that need to be developed and executed on a quantum device. The first case offers experimental physicists with a full-stack experimental platform using realistic qubits with decoherence and error-rates, whereas the second case offers perfect qubits to the quantum application developer, where there is neither decoherence nor error-rates. We conclude the article by explicitly presenting three examples of full-stack quantum accelerators, for an experimental superconducting processor, for quantum accelerated genome sequencing and for

关键词： Quantum computing parallel architectures parallel programming quantum entanglement

来源：评论

学校读者我要写书评

暂无评论

GPU Computation of the Euler Characteristic Curve for Imaging Data

arXiv

引用

arXiv 2022年

作者： Wang, Fan Wagner, Hubert Chen, Chao Stony Brook University Stony Brook United States University of Florida Gainesville United States

Persistent homology is perhaps the most popular and useful tool offered by topological data analysis - with point-cloud data being the most common setup. Its older cousin, the Euler characteristic curve (ECC) is less expressive - but far easier to compute. It is particularly suitable for analyzing imaging data, and is commonly used in fields ranging from astrophysics to biomedical image analysis. These fields are embracing GPU computations to handle increasingly large datasets. We therefore propose an optimized GPU implementation of ECC computation for 2D and 3D grayscale images. The goal of this paper is twofold. First, we offer a practical tool, illustrating its performance with thorough experimentation - but also explain its inherent shortcomings. Second, this simple algorithm serves as a perfect backdrop for highlighting basic GPU programming techniques that make our implementation so efficient - and some common pitfalls we avoided. This is intended as a step towards a wider usage of GPU programming in computational geometry and topology software. We find this is particularly important as geometric and topological tools are used in conjunction with modern, GPU-accelerated machine learning frameworks. © 2022, CC BY-NC-ND.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

History of Coarrays and SPMD parallelism in Fortran

引用

PROCEEDINGS OF THE ACM ON programming LANGUAGES-PACMPL 2020年第HOPL期4卷 1–30页

作者： Reid, John Long, Bill Steidel, Jon JKR Associates Wallingford Oxon England Rutherford Appleton Lab Didcot Oxon England HPE Inc Seattle WA USA Intel Inc Hillsboro OR USA

The coarray programming model is an expression of the Single-Program-Multiple-Data (SPMD) programming model through the simple device of adding a codimension to the Fortran language. A data object declared with a codimension is a coarray object. Codimensions express the idea that some objects are located in local memory while others are located in remote memory. Coarray syntax obeys most of the same rules for normal array syntax. It is familiar to the Fortran programmer so the use of coarray syntax is natural and intuitive. Although the basic idea is quite simple, inserting it into the language definition turned out to be difficult. In addition, the process was complicated by rapidly changing hardware and heated arguments over whether parallelism should be supported best as an interface to language-independent libraries, as a set of directives superimposed on languages, or as a set of specific extensions to existing languages. In this paper, we review both the early history of coarrays and also their development into a part of Fortran 2008 and eventually into a larger part of Fortran 2018. Coarrays have been used, for example, in weather forecasting and in neural networks and deep learning.

关键词： parallel programming Fortran Single Program Multiple Data (SPMD) High Performance Computing (HPC) Coarrays Massively parallel Processors (MPP)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：