检索结果-内蒙古大学图书馆

5th Workshop on OpenSHMEM and Related Technologies (OpenSHMEM)

作者： Namashivayam, Naveen Cernohous, Bob Pou, Dan Pagel, Mark Cray Inc Seattle WA 98164 USA

ISBN: (纸本)9783030049188;9783030049171

SHMEM has a long history as a parallel programming model. It is extensively used since 1993, starting from Cray T3D systems. For the past two decades SHMEM library implementation in Cray systems evolved through different generations. The current generation of the SHMEM implementation for Cray XC and XK systems is called Cray SHMEM. It is a proprietary SHMEM implementation from Cray Inc. In this work, we provide an in-depth analysis of need for a new SHMEM implementation and then introduce the next evolution of Cray SHMEM implementation for current and future generation Cray systems. We call this new implementation Cray OpenSHMEMX. We provide brief design overview, along with a review of functional and performance differences in Cray OpenSHMEMX comparing against the existing Cray SHMEM implementation.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Cellular Automaton Tumor Growth Model 12th

Parallel Cellular Automaton Tumor Growth Model

引用

12th International Conference on Practical Applications of Computational Biology and Bioinformatics (PACBB)

作者： Salguero, Alberto G. Capel, Manuel I. Tomeu, Antonio J. Univ Cadiz Cadiz Spain Univ Granada Granada Spain

ISBN: (纸本)9783319987026;9783319987019

"In silico" experimentation allows us to simulate the effect of different therapies by handling model parameters. Although the computational simulation of tumors is currently a well-known technique, it is however possible to contribute to its improvement by parallelizing simulations on computer systems of many and multi-cores. This work presents a proposal to parallelize a tumor growth simulation that is based on cellular automata by partitioning of the data domain and by dynamic load balancing. The initial results of this new approach show that it is possible to successfully accelerate the calculations of a known algorithm for tumor-growth.

关键词： Cellular automaton High performance computing Mathematical oncology Tumoral growth simulation parallel programming Speedup

来源：评论

学校读者我要写书评

暂无评论

CUDA-Based Particle Swarm Optimization in Reflectarray Antenna Synthesis

ADVANCED ELECTROMAGNETICS

引用

ADVANCED ELECTROMAGNETICS 2020年第2期9卷 66-74页

作者： Capozzoli, Amedeo Curcio, Claudio Liseno, Angelo Univ Napoli Federico II Dipartimento Ingn Elettr & Tecnol Informaz Via Claudio 21 I-80125 Naples Italy

The synthesis of electrically large, highly performing reflectarray antennas can be computationally very demanding both from the analysis and from the optimization points of view. It therefore requires the combined usage of numerical and hardware strategies to control the computational complexity and provide the needed acceleration. Recently, we have set up a multi-stage approach in which the first stage employs global optimization with a rough, computationally convenient modeling of the radiation, while the subsequent stages employ local optimization on gradually refined radiation models. The purpose of this paper is to show how reflectarray antenna synthesis can take profit from parallel computing on Graphics Processing Units (GPUs) using the CUDA language. In particular, parallel computing is adopted along two lines. First, the presented approach accelerates a Particle Swarm Optimization procedure exploited for the first stage. Second, it accelerates the computation of the field radiated by the reflectarray using a GPU-implemented Non-Uniform FFT routine which is used by all the stages. The numerical results show how the first stage of the optimization process is crucial to achieve, at an acceptable computational cost, a good starting point.

关键词： Reflectarray antenna synthesis CUDA parallel programming Graphics Processing Units (GPUs) Particle Swarm Optimization

来源：评论

学校读者我要写书评

暂无评论

STCLang: State Thread Composition as a Foundation for Monadic Dataflow parallelism 12

STCLang: State Thread Composition as a Foundation for Monadi...

引用

12th ACM SIGPLAN International Symposium on Haskell (Haskell)

作者： Ertel, Sebastian Adam, Justus Rink, Norman A. Goens, Andres Castrillon, Jeronimo Huawei Technol Dresden Res Lab Dresden Germany Tech Univ Dresden Chair Compiler Construct Dresden Germany Tech Univ Dresden Dresden Germany

ISBN: (纸本)9781450368131

Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflowexecution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution. Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations. In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-theart big data processing systems.

关键词： parallel programming functional languages partitioned state

来源：评论

学校读者我要写书评

暂无评论

Efficient parallelization of MLFMA for 3D Electromagnetic Scattering Problems on Sunway Many-core Processor SW26010

Efficient Parallelization of MLFMA for 3D Electromagnetic Sc...

引用

Photonics and Electromagnetics Research Symposium - Fall (PIERS - Fall)

作者： He, W. J. Yang, M. L. Wang, W. Sheng, X. Q. Beijing Inst Technol Ctr Electromagnet Simulat Beijing 100081 Peoples R China Chinese Acad Sci Comp Network Informat Ctr Beijing 100190 Peoples R China

ISBN: (纸本)9781728153049

A many-core implementation of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model for computing electromagnetic scattering by a 3-D object on the homegrown many-core SW26010 CPU of China is presented. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the Structure-of-Array (SoA). The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory (SPM). A double-buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable for solving real-life problems with over four hundred thousand unknowns with a remarkable speed-up. Numerical results show that with the proposed parallel scheme, a total speed-up larger than 7 times can be achieved, compared with the CPU master-core.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Characterizing Extracurricular Effort on Concurrent and Distributed programming Learning 4

Characterizing Extracurricular Effort on Concurrent and Dist...

引用

4th Costa Rican Conference on Computing and Informatics Research (JoCICI)

作者： Hidalgo-Cespedes, Jeisson Univ Costa Rica ECCI CITIC San Jose Costa Rica

ISBN: (纸本)9781728147871

The importance of concurrent and distributed programming is increasing on Computer Science curricula. This exploratory research identifies additional notions required by the official topics of "parallel and Concurrent programming" course, taught at the University of Costa Rica. This paper characterizes previous knowledge that students had about these notions and the extracurricular effort that they made to overcome the lack of notions. Findings show that students were able to overcome the lack of notions at expense of more extracurricular effort. Exploratory evidence indicates that students' election of professors in previous courses influenced their performance and extracurricular effort in the parallel programming course.

关键词： previous knowledge extracurricular effort course concurrent programming parallel programming distributed programming

来源：评论

学校读者我要写书评

暂无评论

Studying the Structure of parallel Algorithms as a Key Element of High-Performance Computing Education 24th

Studying the Structure of Parallel Algorithms as a Key Eleme...

引用

International European Conference on parallel and Distributed Computing (Euro-Par)

作者： Voevodin, Vladimir Antonov, Alexander Popova, Nina Lomonosov Moscow State Univ Moscow Russia

ISBN: (纸本)9783030105495;9783030105488

Since the computing world has become fully parallel, every software developer today should be familiar with the notion of "parallel algorithm structure." If in recent years, students have studied a basic introduction to algorithms;today, parallel algorithm structure must become a vital part of computer science education. In this work we present two years of experience teaching a "Supercomputer Modeling and Technologies" course, and running practical assignments at the Computational Mathematics and Cybernetics faculty of Lomonosov Moscow State University, aimed at teaching students a methodology for analyzing parallel algorithm properties.

关键词： Structure of parallel algorithms High-performance computing education parallel programming Educational curricula Computer science curricula Undergraduate students

来源：评论

学校读者我要写书评

暂无评论

Peachy parallel Assignments (EduHPC 2019)

Peachy Parallel Assignments (EduHPC 2019)

引用

Workshop on Education for High Performance Computing (EduHPC)

作者： Agung, Mulya Amrizal, Muhammad Alfian Bogaerts, Steven Egawa, Ryusuke Ellsworth, Daniel A. Fernandez-Fabeiro, Jorge Gonzalez-Escribano, Arturo Kundu, Sukhamay Lazar, Alina Malony, Allen Takizawa, Hiroyuki Bunde, David P. Tohoku Univ Sendai Miyagi Japan Depauw Univ Greencastle IN 46135 USA Colorado Coll Colorado Springs CO 80903 USA Univ Valladolid Valladolid Spain Louisiana State Univ Baton Rouge LA 70803 USA Youngstown State Univ Youngstown OH 44555 USA Univ Oregon Eugene OR 97403 USA Knox Coll Galesburg IL USA

ISBN: (纸本)9781728159751

Peachy parallel assignments are high-quality assignments for teaching parallel and distributed computing. They have been successfully used in class and are selected on the basis of their suitability for adoption and for being cool and inspirational for students. Here we present a fire fighting simulation, thread-to-core mapping on NUMA nodes, introductory cloud computing, interesting variations on prefix-sum, searching for a lost PIN, and Big Data analytics.

关键词： parallel computing education High-Performance Computing education parallel programming Cloud Computing Curriculum Development Thread Mapping Data Analytics Prefix-sum OpenMP MPI GPGPU NUMA Dask

来源：评论

学校读者我要写书评

暂无评论

A tale of lock-free agents: towards Software Transactional Memory in parallel Agent-Based Simulation

引用

COMPLEX ADAPTIVE SYSTEMS MODELING 2019年第1期7卷 1页

作者： Thaler, Jonathan Siebers, Peer-Olaf Univ Nottingham 7301 Wollaton Rd Nottingham NG8 1BB England

With the decline of Moore's law and the ever increasing availability of cheap massively parallel hardware, it becomes more and more important to embrace parallel programming methods to implement Agent-Based Simulations (ABS). This has been acknowledged in the field a while ago and numerous research on distributed parallel ABS exists, focusing primarily on parallel Discrete Event Simulation as the underlying mechanism. However, these concepts and tools are inherently difficult to master and apply and often an excess in case implementers simply want to parallelise their own, custom agent-based model implementation. However, with the established programming languages in the field, Python, Java and C++, it is not easy to address the complexities of parallel programming due to unrestricted side effects and the intricacies of low-level locking semantics. Therefore, in this paper we propose the use of a lock-free approach to parallel ABS using Software Transactional Memory (STM) in conjunction with the pure functional programming language Haskell, which in combination, removes some of the problems and complexities of parallel implementations in imperative approaches. We present two case studies, in which we compare the performance of lock-based and lock-free STM implementations in two different well known Agent-Based Models, where we investigate both the scaling performance under increasing number of CPU cores and the scaling performance under increasing number of agents. We show that the lock-free STM implementations consistently outperform the lock-based ones and scale much better to increasing number of CPU cores both on local hardware and on Amazon EC. Further, by utilizing the pure functional language Haskell we gain the benefits of immutable data and lack of unrestricted side effects guaranteed at compile-time, making validation easier and leading to increased confidence in the correctness of an implementation, something of fundamental importance and benefit in paral

关键词： Agent-Based Simulation Software Transactional Memory parallel programming Haskell

来源：评论

学校读者我要写书评

暂无评论

GPU Extended Stock Market Software Architecture 1

引用

4th European-Alliance-for-Innovation (EAI) International Conference on Future Access Enablers of Ubiquitous and Intelligent Infrastructures (FABULOUS)

作者： Krstova, Alisa Gusev, Marjan Zdraveski, Vladimir Univ Ss Cyril & Methodius Fac Comp Sci & Engn Skopje North Macedonia

ISBN: (数字)9783030239763

ISBN: (纸本)9783030239763;9783030239756

We propose a stock market software architecture extended by a graphics processing unit, which employs parallel programming paradigm techniques to optimize long-running tasks like computing daily trends and performing statistical analysis of stock market data in realtime. The system uses the ability of Nvidia's CUDA parallel computation application programming interface (API) to integrate with traditional web development frameworks. The web application offers extensive statistics and stocks' information which is periodically recomputed through scheduled batch jobs or calculated in real-time. To illustrate the advantages of using many-core programming, we explore several use-cases and evaluate the improvement in performance and speedup obtained in comparison to the traditional approach of executing long-running jobs on a central processing unit (CPU).

关键词： Stock market GPU parallel programming CUDA

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：