检索结果-内蒙古大学图书馆

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Matt Martineau Simon McIntosh-Smith Wayne Gaudin HPC Group University of Bristol Bristol United Kingdom Atomic Weapons Establishment Aldermaston United Kingdom

ISBN: (纸本)9781509036837

Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performance of Cray's new NVIDIA GPU targeting implementation of OpenMP 4.0, with the mini-apps TeaLeaf, CloverLeaf and BUDE. We successfully port each of the applications, using a simple and consistent design throughout, and achieve performance on an NVIDIA K20X that is comparable to Cray's OpenACC in all cases. BUDE, a compute bound code, required 2.2x the runtime of an equivalently optimised CUDA code, which we believe is caused by an inflated frequency of control flow operations and less efficient arithmetic optimisation. Impressively, both TeaLeaf and CloverLeaf, memory bandwidth bound codes, only required 1.3x the runtime of hand-optimised CUDA implementations. Overall, we find that OpenMP 4.0 is a highly usable open standard capable of performant heterogeneous execution, making it a promising option for scientific application developers.

关键词： Standards Graphics processing units Performance evaluation Complexity theory parallel processing parallel programming Runtime

来源：评论

学校读者我要写书评

暂无评论

parallel and Flexible Dynamic programming via the Mini-Batch Bellman Operator

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2024年第1期69卷 455-462页

作者： Gargiani, Matilde Martinelli, Andrea Martinez, Max Ruts Lygeros, John ETH Automat Control Lab CH-8092 Zurich Switzerland Swiss Fed Inst Technol CH-8006 Zurich Switzerland

The Bellman operator constitutes the foundation of dynamic programming (DP). An alternative is presented by the Gauss-Seidel operator, whose evaluation, differently from that of the Bellman operator where the states are all processed at once, updates one state at a time while incorporating into the computation the interim results. The provably better convergence rate of DP methods based on the Gauss-Seidel operator comes at the price of an inherent sequentiality, which prevents the exploitation of modern multicore systems. In this work, we propose a new operator for DP, namely, the mini-batch Bellman operator, which aims at realizing the tradeoff between the better convergence rate of the methods based on the Gauss-Seidel operator and the parallelization capability offered by the Bellman operator. After the introduction of the new operator, a theoretical analysis for validating its fundamental properties is conducted. Such properties allow one to successfully deploy the new operator in the main DP schemes, such as value iteration and modified policy iteration. We compare the convergence of the DP algorithm based on the new operator with its earlier counterparts, shedding light on the algorithmic advantages of the new formulation and the impact of the batch-size parameter on the convergence. Finally, an extensive numerical evaluation of the newly introduced operator is conducted. In accordance with the theoretical derivations, the numerical results show the competitive performance of the proposed operator and its superior flexibility, which allows one to adapt the efficiency of its iterations to different structures of MDPs and hardware setups.

关键词： Convergence Costs Dynamic programming Optimal control Cost function Standards Process control Algorithms dynamic programming (DP) parallel programming

来源：评论

学校读者我要写书评

暂无评论

Finding partial hash collisions by brute force parallel programming

Finding partial hash collisions by brute force parallel prog...

引用

IEEE Princeton Section Sarnoff Symposium

作者： Vincent Chiriaco Aubrey Franzen Rebecca Thayil Xiaowen Zhang Dept. of Computer Science University of North Alabama Florence AL U.S.A. Dept. of Computer Science Northern Kentucky Uinversity Highland Heights KY U.S.A. Dept. of Physics Bryn Mawr College Bryn Mawr PA U.S.A. Dept. of Computer Science College of Staten Island / CUNY Staten Island NY U.S.A.

ISBN: (纸本)9781509015412

A hash function maps an arbitrary length of (longer) message into a fixed length of shorter string, called message digest. Inevitably there will be a lot of different messages being hashed to the same or similar digest. We call this collision or partial collision. By utilizing multiple processors from the CUNY High Performance Computing Center's facility, we locate partial collisions for MD5 and SHA-1 by brute force parallel programming in C with MPI library. The brute force method of finding a second preimage collision entails systematically computing all of the permutations, digests, and Hamming distances of the target preimage. We explore varying size target strings and the number of processors allocation and examine the effect these variables have on finding partial collisions. The results show that for the same message space the search time for the partial collisions is roughly halved for each doubling of the number of processors; and the longer the message is the better partial collisions are produced.

关键词： Program processors Cryptography Resistance Force parallel programming Hamming distance

来源：评论

学校读者我要写书评

暂无评论

The ForeC Synchronous Deterministic parallel programming Language for Multicores

The ForeC Synchronous Deterministic Parallel Programming Lan...

引用

IEEE International Symposium on Embedded Multicore Socs (MCSoC)

作者： Eugene Yip Alain Girault Partha S. Roop Morteza Biglari-Abhari Software Technologies Research Group University of Bamberg Germany Inria Lab. LIG Grenoble France. CNRS Lab. LIG Grenoble France Department of ECE The University of Auckland New Zealand

ISBN: (纸本)9781509035328

Cyber-physical systems (CPSs) are embedded systems that are tightly integrated with their physical environment. The correctness of a CPS depends on the output of its computations and on the timeliness of completing the computations. This paper proposes the ForeC language for the deterministic parallel programming of CPS applications on multi-core execution platforms. ForeC's synchronous semantics is designed to greatly simplify the understanding and debugging of parallel programs. ForeC allows programmers to express many forms of parallel patterns while ensuring that programs are amenable to static timing analysis. One of ForeC's main innovation is its shared variable semantics that provides thread isolation and deterministic thread communication. Through benchmarking, we demonstrate that ForeC can achieve better parallel performance than Esterel, a widely used synchronous language for concurrent safety-critical systems, and OpenMP, a popular desktop solution for parallel programming. We demonstrate that the worst-case execution time of ForeC programs can be estimated precisely.

关键词： Instruction sets Multicore processing Semantics parallel programming Embedded systems Timing Technological innovation

来源：评论

学校读者我要写书评

暂无评论

Automatic parallel programming using the descartes specification language

Automatic parallel programming using the descartes specifica...

引用

International Conference on Information and Communication Systems (ICICS)

作者： Nina Sakhnini Venkata N. Inukollu Joseph E. Urban Computer Engineering Jordan Uni. of Science and Tech Irbid Jordan School of Science and Computer University of Houston Clear Lake TX USA Arizona State University Tempe AZ USA

ISBN: (纸本)9781467386159

Automatic programming can be defined as developing software in a high abstraction level. The definition of automatic programming is not precise because what is meant by automatic programming is changing over time. The goal of automatic programming has the programmer set the specifications of a program and the computer generate the source code of that program. There exists a group of specification languages that vary in their properties; the Descartes specification language is known to be comprehensible and easily constructible. Descartes represents the specifications by defining a system's inputs and outputs, as well as the relationship between these as functions. Descartes has been extended to support concurrent systems. These features made Descartes to be a good basis to build this research effort on. This research effort studied automatic programming approaches and created a shortcut between specifications and implementation with all its benefits. This research created a way to transform Descartes specifications into C source code automatically. Automatic programming can apply to all fields of knowledge that can be automated; therefore, the scope of this research project was restricted to a few case studies that involve parallel programming.

关键词： Automatic programming Specification languages parallel programming Software Computers Communication systems

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluations of Different parallel programming Paradigms for Pennes Bioheat Equations and Navier-Stokes Equations

Performance Evaluations of Different Parallel Programming Pa...

引用

International Computer Symposium (ICS)

作者： Chau-Yi Chou Kuen-Tsann Chen Department of Applied Mathematics National Center for High-performance Computing Taiwan Department of Applied Mathematics National Chung Hsing University Taiwan

ISBN: (纸本)9781509034390

The chip heat dissipations defeat the clock speed increment. Multi-core clusters and the heterogeneous platforms including accelerators become a main trend recently. parallel programming paradigms surfs on these diverse platforms: CUDA C, CUDA Fortran, OpenCL, OpenACC, OpenMP, MPI, pthread, MapReduce, and so on. The quantitative performance indexes help get a good picture of parallel programming paradigms for the applications. This study employ two examples: Pennes bioheat equations to simulating local hyperthermia destroying tumor cells and Navier-Stokes equations to simulating driven cavity flow at high Reynolds numbers via parallel programming paradigms: CUDA C, CUDA Fortran, OpenMP and MPI. parallel programming in MPI for Pennes bioheat equations shows super-linear speedup on NCHC (National Center for High-performance Computing) ALPS and significantly faster than the original author, whereas parallel programming in CUDA C framework for Navier-Stokes equations achieves around 24 times speedup on a NVIDIA C1060 GPU. We hope these results to support useful suggestions.

关键词： Graphics processing units Tumors Mathematical model Heating MATLAB parallel programming

来源：评论

学校读者我要写书评

暂无评论

A parallel programming Course Based on an Execution Time-Energy Consumption Optimization Problem

A Parallel Programming Course Based on an Execution Time-Ene...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Javier Cuenca Domingo Giménez Department of Engineering and Technology of Computers University of Murcia Murcia Spain Department of Computing and Systems University of Murcia Murcia Spain

ISBN: (纸本)9781509036837

This paper presents an experience of Problem-based learning in a parallel programming course. The course includes the basics of parallel programming, from methodological and technological aspects to the analysis and design of parallel algorithms. The students work with an optimization problem in the field of parallel Computing. The execution time and the energy consumption of a simplified master-slave scheme in a simplified heterogeneous system are optimized, so treating it as a bi-objective optimization problem, which is addressed with sequential, shared-memory, message-passing and hybrid parallel programming. In this way, the students follow the various parts of the syllabus of the course by working with a problem in which topics studied in previous courses are combined (green computing, computational systems architecture, optimization, heuristics), and this contributes to a deeper understanding of these topics and motivates the introduction of new concepts.

关键词： Program processors parallel programming Energy consumption Optimization parallel processing Master-slave

来源：评论

学校读者我要写书评

暂无评论

HighP5: programming using Partitioned parallel Processing Spaces

引用

Journal of the Brazilian Computer Society 2024年第1期30卷 653-687页

作者： Yanhaona, Muhammad Nur Grimshaw, Andrew Mickey, Shahriar Hasan Brac University Bangladesh University of Virginia United States

HighP5 is a new high-level parallel programming language designed to help software developers to achieve three objectives simultaneously: programmer productivity, program portability, and superior program performance. HighP5 enables this by fostering a new programming paradigm that we call hardware-cognizant parallel programming. The paradigm uses a uniform hardware abstraction and a declarative programming syntax to allow programmers to write hardware feature-sensitive efficient programs without delving into the detail of those feature implementations. This paper is the first comprehensive description of HighP5’s design rationale, language grammar, and core features. It also discusses the runtime behavior of HighP5 programs. In addition, the paper presents preliminary results on program performance from HighP5 compilers on three different architectural plat-forms: shared-memory multiprocessors, distributed memory multi-computers, and hybrid GPU/multi-computers. © 2024, Brazilian Computing Society. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

SOFTWARE TOOLS FOR AUTOMATION OF parallel programming ON THE BASIS OF ALGEBRA OF ALGORITHMS

引用

CYBERNETICS AND SYSTEMS ANALYSIS 2015年第1期51卷 142-149页

作者： Andon, F. I. Doroshenko, A. E. Beketov, A. G. Iovchev, V. A. Yatsenko, E. A. Natl Acad Sci Ukraine Inst Software Syst Kiev Ukraine

The development of the algebra-algorithmic methodology and tools for automated design and generation of programs for graphics processing units is proposed. A particular feature of the proposed approach is the use of high-level specifications that are close to natural-language specifications and also the application of a method that ensures the syntactical correctness of algorithms and programs being designed. The approach was implemented in a toolkit destined for interactively designing algorithm schemes and generating programs. The use of this toolkit is illustrated by the development of a parallel program in the field of meteorology.

关键词： algebra of algorithms automated program design and generation graphics processing unit (GPU) parallel programming algorithm scheme

来源：评论

学校读者我要写书评

暂无评论

Research of parallel programming techniques for the hierarchical model based on clusters of SMPs 5th

Research of parallel programming techniques for the hierarch...

引用

5th International Conference on Environmental Science and Information Application Technology (ESIAT)

作者： Zhu, Yong-zhi Yu, Ji-guo Cao, Bao-xiang Qufu Normal Univ Sch Informat Sci & Engn Rizhao Peoples R China

ISBN: (纸本)9781315684895;9781138028142

With the current prevalence of multi-core processors in SMP cluster architectures, mixed-mode programming, using both MPI and OpenMP in the same application, is becoming increasingly important. In this paper we discuss three methods for the parallelization of such algorithms, namely pure MPI parallelization, fine-grain hybrid MPI/OpenMP parallelization, and coarse-grain MPI/OpenMP parallelization. We propose a new hybrid parallel programming method based on architecture hierarchy on SMP cluster. We designed a hierarchical parallel algorithm on the N-body problem, and compare its performance with the traditional hybrid parallel algorithm on the Dawning 5000A cluster. The results indicate that the hierarchical hybrid parallel algorithm has better scalability and speed.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：