检索结果-内蒙古大学图书馆

Efficient heterogeneous programming with FPGAs using the Controller model

JOURNAL OF SUPERCOMPUTING 2021年第12期77卷 13995-14010页

作者： Rodriguez-Canal, Gabriel Torres, Yuri Andujar, Francisco J. Gonzalez-Escribano, Arturo Univ Edinburgh Bayes Ctr 47 Potterrow Edinburgh EH8 9BT Midlothian Scotland Univ Valladolid Dept Informat Escuela Ingn Informat Campus Miguel Delibes S-N Valladolid 47011 Spain

The Controller model is a heterogeneous parallel programming model implemented as a library. It transparently manages the coordination, communication and kernel launching details on different heterogeneous computing devices. It exploits native or vendor specific programming models and compilers, such as OpenMP, CUDA or OpenCL, thus enabling the potential performance obtained by using them. This work discusses the integration of FPGAs in the Controller model, using high-level synthesis tools and OpenCL. A new Controller backend for FPGAs is presented based on a previous OpenCL backend for GPUs. We discuss new configuration parameters for FPGA kernels and key ideas to adapt the original OpenCL backend while maintaining the portability of the original model. We present an experimental study to compare performance and development effort metrics obtained with the Controller model, Intel oneAPI and reference codes directly programmed with OpenCL. The results show that using the Controller library has advantages and drawbacks compared with Intel oneAPI, while compared with OpenCL it highly reduces the programming effort with negligible performance overhead.

关键词： parallel programming FPGA OpenCL Heterogeneous computing

来源：评论

学校读者我要写书评

暂无评论

High Performance Computing for Power Flow Analysis: A Way Forward for Indian Power Sector 22

High Performance Computing for Power Flow Analysis: A Way Fo...

引用

22nd National Power Systems Conference (NPSC)

作者： Jain, Jainendra Sudhakar, Palem Benny Mohan, Katta Jagan Kumar, Senthil R. Bapu, Bindhumadhava S. Real Time Syst Grp Ctr Dev Adv Comp Bengaluru India

ISBN: (纸本)9781665462020

With the advent of renewable energy, smart grids, and cutting-edge measurement technologies, modern power systems are becoming more complex. As a result, analyzing modern power systems requires more computational power. High Performance Computing (HPC) is the most viable option for meeting this demand. In India's power sector, the use of HPC is minimal. Hence, we are introducing HPC-based power flow analysis, which is the highly used application to analysis the system. The paper demonstrates the importance of HPC for power flow analysis. This paper also discusses a modified Gaussian Elimination method to utilize the sparse nature of Jacobian matrix to speedup the computation. Open-Multi Processing (OpenMP) is used to implement parallel computing. parallel power flow analysis is simulated on the Central Processing Unit (CPU) and Graphics Processing Unit (GPU) nodes of the C-DAC's PARAM Utkarsh supercomputer for various power system networks. The speedup obtained with HPC for the Polish 9241 bus network is 216.14 times the sequential computation.

关键词： Power systems power flow analysis high performance computing parallel programming openmp newton raphson method

来源：评论

学校读者我要写书评

暂无评论

Real-time Low Vision Simulation in Mixed Reality 16

Real-time Low Vision Simulation in Mixed Reality

引用

16th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS)

作者： Acevedo, Valeria Colantoni, Philippe Dinet, Eric Tremeau, Alain Univ Jean Monnet Univ Lyon Lab Hubert Curien UMR 5516 St Etienne France

ISBN: (纸本)9781665464956

Visual impairments are a global health issue with profound socioeconomic ramifications in both the developing and the developed world. There exist ongoing research projects, that aim to investigate the influence of light in the perception of low vision individuals. But as of today, there is neither clear knowledge nor extensive data regarding the influence of light in low vision situations. This research will address these issues by introducing a methodology and a system to simulate visual impairments. A pipeline based on eye anatomy coupled with real-time image processing algorithms allows to dynamically simulate low vision specific characteristics of selected impairments in mixed reality. An original new approach based on massively parallelized processing combined with an efficient modeling of eye refractive errors aims to improve the accuracy of the low vision simulation.

关键词： mixed reality low-vision simulation parallel programming image processing

来源：评论

学校读者我要写书评

暂无评论

Configuration of parallel Real-Time Applications on Multi-Core Processors 20

Configuration of Parallel Real-Time Applications on Multi-Co...

引用

20th IEEE International Conference on Industrial Informatics (INDIN)

作者： Gharajeh, Mohammad Samadi Carvalho, Tiago Pinho, Luis Miguel Polytech Inst Porto Sch Engn Porto Portugal

ISBN: (数字)9781728175683

ISBN: (纸本)9781728175683

parallel programming models (e.g., OpenMP) are more and more used to improve the performance of real-time applications in modern processors. Nevertheless, these processors have complex architectures, being very difficult to understand their timing behavior. The main challenge with most of existing works is that they apply static timing analysis for simpler models or measurement-based analysis using traditional platforms (e.g., single core) or considering only sequential algorithms. How to provide an efficient configuration for the allocation of the parallel program in the computing units of the processor is still an open challenge. This paper studies the problem of performing timing analysis on complex multi-core platforms, pointing out a methodology to understand the applications' timing behavior, and guide the configuration of the platform. As an example, the paper uses an OpenMP-based program of the Heat benchmark on a NVIDIA Jetson AGX Xavier. The main objectives are to analyze the execution time of OpenMP tasks, specify the best configuration of OpenMP directives, identify critical tasks, and discuss the predictability of the system/application. A Linux perf based measurement tool, which has been extended by our team, is applied to measure each task across multiple executions in terms of total CPU cycles, the number of cache accesses, and the number of cache misses at different cache levels, including L1, L2 and L3. The evaluation process is performed using the measurement of the performance metrics by our tool to study the predictability of the system/application.

关键词： real-time systems multi-core processors timing analysis parallel programming OpenMP

来源：评论

学校读者我要写书评

暂无评论

Enabling Support for Zero Copy Semantics in an Asynchronous Task-Based programming Model 27th

Enabling Support for Zero Copy Semantics in an Asynchronous ...

引用

27th International European Conference on parallel and Distributed Computing (Euro-Par)

作者： Bhat, Nitin White, Sam Kale, Laxmikant, V Charmworks Inc Urbana IL 61801 USA Univ Illinois Dept Comp Sci 1304 W Springfield Ave Urbana IL 61801 USA

ISBN: (纸本)9783031061561;9783031061554

Communication is critical to the scalable and efficient performance of scientific simulations on extreme scale computing systems. Part of the promise of task-based programming models is that they can naturally overlap communication with computation and exploit locality between tasks. Copy-based semantics using eager communication protocols easily enable such asynchrony by alleviating the responsibility of buffer management from the user, both on the sender and the receiver. However, these semantics increase memory allocations and copies and in turn affect application memory footprint and performance, especially with large message buffers. In this work we describe how the so-called "zero copy" messaging semantics can be supported in Converse, the message-driven parallel programming framework that is used by Charm++, by implementing support for user-owned buffer transfers in its lower level runtime system, LRTS. These semantics work on user-provided buffers and do not semantically require copies by either the user or the runtime system. We motivate our work by reviewing the existing messaging model in Converse/Charm++, identify its semantic shortcomings, and define new LRTS and Converse APIs to support zero copy communication based on RDMA capabilities. We demonstrate the utility of our new communication interfaces with benchmarks written in Converse. The result is up to 91% of message latency improvement and improved memory usage. These advances will enable future work on user-facing APIs in Charm++.

关键词： Charm plus Converse RDMA parallel programming Asynchronous tasking Communication optimizations

来源：评论

学校读者我要写书评

暂无评论

parallel Block-Delayed Sequences 22

Parallel Block-Delayed Sequences

引用

27th ACM SIGPLAN Symposium on Principles and Practice of parallel programming (PPoPP)

作者： Westrick, Sam Rainey, Mike Anderson, Daniel Blelloch, Guy E. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450392044

programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion" to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces. In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.

关键词： parallel programming fusion collections functional programming

来源：评论

学校读者我要写书评

暂无评论

Facilitating the learning process in parallel computing by using instant messaging

引用

JOURNAL OF SUPERCOMPUTING 2021年第4期77卷 3899-3913页

作者： Guerrero-Higueras, Angel Manuel Sanchez-Gonzalez, Lidia Conde-Gonzalez, Miguel Angel Castejon-Limas, Manuel Univ Leon Dept Mech Comp Sci & Aerosp Engn Campus Vegazana S-N Leon 24071 Spain

parallel programming skills may require a long time to acquire. "Think in parallel" is a skill that requires time, effort, and experience. In this work, we propose to facilitate the students' learning process in parallel programming by using instant messaging. Our aim was to find out whether students' interaction through instant messaging tools is beneficial for the learning process. In order to do so, we asked several students of an HPC course of the Master's degree in Computer Science of the University of Leon to develop a specific parallel application, each of them using a different application program interface: OpenMP, MPI, CUDA, or OpenCL. Even though the used APIs are different, there are common points in the design process. We encouraged students to interact with each other by using Gitter, an instant messaging tool for GitHub users. Our analysis of the communications and results demonstrate that the direct interaction of students through the Gitter tool has a positive impact on the learning process.

关键词： High-performance computing Instant messaging parallel programming

来源：评论

学校读者我要写书评

暂无评论

Highly Efficient and Scalable Framework for High-Speed Super-Resolution Microscopy

引用

IEEE ACCESS 2021年 9卷 97053-97067页

作者： Do, Quan Acuna, Sebastian Kristiansen, Jon Ivar Agarwal, Krishna Ha, Phuong Hoai UiT Arctic Univ Norway Dept Comp Sci N-9037 Tromso Norway UiT Arctic Univ Norway Dept Phys & Technol N-9037 Tromso Norway

The multiple signal classification algorithm (MUSICAL) is a statistical super-resolution technique for wide-field fluorescence microscopy. Although MUSICAL has several advantages, such as its high resolution, its low computational performance has limited its exploitation. This paper aims to analyze the performance and scalability of MUSICAL for improving its low computational performance. We first optimize MUSICAL for performance analysis by using the latest high-performance computing libraries and parallel programming techniques. Thereafter, we provide insights into MUSICAL's performance bottlenecks. Based on the insights, we develop a new parallel MUSICAL in C++ using Intel Threading Building Blocks and the Intel Math Kernel Library. Our experimental results show that our new parallel MUSICAL achieves a speed-up of up to 30.36x on a commodity machine with 32 cores with an efficiency of 94.88%. The experimental results also show that our new parallel MUSICAL outperforms the previous versions of MUSICAL in Matlab, Java, and Python by 30.43x, 2.63x, and 1.69x, respectively, on commodity machines.

关键词： Music Microscopy Superresolution Matlab C plus plus languages Windows Python Computational nanoscopy parallel programming optimization super-resolution imaging image enhancement

来源：评论

学校读者我要写书评

暂无评论

Inductive definitions in logic versus programs of real-time cellular automata

引用

THEORETICAL COMPUTER SCIENCE 2024年 987卷

作者： Grandjean, Etienne Grente, Theo Terrier, Veronique Normandie Univ UNICAEN ENSICAEN CNRSGREYC F-14000 Caen France

Descriptive complexity provides intrinsic, i.e. machine-independent , characterizations of the main complexity classes. On the other hand, logic can be useful for designing programs in a natural declarative way. This is especially important for parallel computation models such as cellular automata, since designing parallel programs is considered a difficult task. This paper establishes three logical characterizations of the three classical complexity classes modeling minimal time, called real-time , of one-dimensional cellular automata according to their canonical variations: unidirectional or bidirectional communication, input word given in a parallel or sequential way. Our three logics are natural restrictions of existential second-order Horn logic with built-in successor and predecessor functions. These logics correspond exactly to the three ways of deciding a language on a square grid circuit of side �� according to one of the three natural locations of an input word of length ��: along a side of the grid, on the diagonal that contains the output cell - placed on the vertex (n,n) of the square grid-, or on the diagonal opposite to the output cell. The key ingredient to our results is a normalization method that transforms a formula from one of our three logics into an equivalent normalized formula that closely mimics a grid circuit. Then, we extend our logics by allowing a limited use of negation on hypotheses like in Stratified Datalog. By revisiting in detail a number of representative classical problems -recognition of the set of primes by Fisher's algorithm, Dyck language recognition, Firing Squad Synchronization problem, etc. -we show that this extension makes easier programming and we prove that it does not change the real-time complexity of our logics. Finally, based on our experience in expressing these representative problems in logic, we argue that our logics are high-level programming languages: they make it possible to express in a natural, c

关键词： Computational complexity Descriptive complexity Cellular automaton Real-time computation Horn formula Existential second-order logic Inductive logic parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Compiler Extension for parallel Matrix programming 43

A Compiler Extension for Parallel Matrix Programming

引用

43rd Annual International Conference on parallel Processing (ICPP)

作者： Williams, Kevin Le, Matthew Kaminski, Ted Van Wyk, Eric Univ Minnesota Dept Comp Sci & Engn Minneapolis MN 55455 USA

ISBN: (纸本)9781479956180

This paper describes a compiler extension to our prototype extensible C translator that adds new features for parallel execution of matrix operations and shows their application to problems in spatio-temporal data mining. The extension provides new language features for constructing new matrices, mapping functions over elements of a matrix, and accumulating operations that, for example, can sum values in a matrix. It also provides the appropriate semantic analysis to check for errors before translating the constructs down to parallel C code. The extension also provides features that let the programmer indicate how the extension translates these matrix constructs down to C code. Programmers seeking higher levels of performance can specify how the underlying for-loops are structured so that code using, for example, loop-tiling techniques or vector processors, is generated. In general, compiler extensions supported by our approach allow new domain-specific syntax and semantic analyses to be easily added to the host language. Specifications of the host C language and the extensions are composed to create a custom translator that maps extended C programs down to plain (parallel) C code, checking for domain-specific errors and applying high-level domain-specific optimizations in the process.

关键词： C language data mining formal specification matrix algebra optimisation parallel programming program compilers compiler extension high-level domain-specific optimization language feature loop-tiling technique matrix operation parallel C code parallel matrix programming prototype extensible C translator semantic analysis spatio-temporal data mining syntax analysis vector processor Generators Indexing Instruction sets Oceans Semantics Syntactics extensible languages matrix programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：