检索结果-内蒙古大学图书馆

parallel implementation of the EGSnrc Monte Carlo simulation of ionizing radiation transport using OpenMP

MEDICAL PHYSICS 2017年第12期44卷 6672-6677页

作者： Doerner, Edgardo Caprile, Paola Pontificia Univ Catolica Chile Inst Phys Santiago 7820436 Chile

Purpose: To present the implementation of a new option for parallel processing of the EGSnrc Monte Carlo system using the OpenMP API, as an alternative to the provided method based on the use of a batch queuing system (BQS). Methods: The parallel solution presented, called OMP_EGS, makes use of OpenMP features to control the workload distribution between the compute units. These features were inserted into the original EGSnrc source code through properly defined macros. In order to validate the platform, the possibility of producing results in exact agreement with the serial implementation was assessed. The performance of OMP_EGS was evaluated against the BQS method, in terms of parallel speedup and efficiency. Results: As the OpenMP features can be activated or deactivated depending on the compilation options, the implementation of the platform allowed the direct recovery of the original serial implementation. The validation tests showed that OMP_EGS was able to reproduce the exact same results as the serial implementation. The performance and scalability tests showed that OMP_EGS is a better alternative than the EGSnrc BQS parallel implementation, both in terms of runtime and parallel efficiency. Conclusions: The presented solution has several advantages over the BQS-based parallel implementation available for the EGSnrc system. One of the main advantages is that, in contrast to the BQS alternative, it can be implemented using different compilers and operative systems, which turns it into a compact and portable solution that can be used on a wide range of working environments. It does not introduce artifacts on the simulated distributions, as it only handles the distribution of work among the available computing resources and it proved to have a better performance. (C) 2017 American Association of Physicists in Medicine.

关键词： Monte Carlo methods multicore systems OpenMP programming parallel programming particle transport simulation

来源：评论

学校读者我要写书评

暂无评论

Distribution System Optimization on Graphics Processing Unit

引用

IEEE TRANSACTIONS ON SMART GRID 2017年第4期8卷 1689-1699页

作者： Roberge, Vincent Tarbouchi, Mohammed Okou, Francis A. Royal Mil Coll Canada Dept Elect & Comp Engn Kingston ON K7K 7B4 Canada

Power distribution networks operate in a radial topology, but also include extra tie switches to allow for their reconfiguration in case of scheduled maintenance or unexpected failure. With the implementation of the smart grid and the development of fast high power switching devices, it is now possible to automatize this reconfiguration to also adjust to demand fluctuation and always operate the network in the optimal topology, minimizing power transmission losses. This automation requires the development of highly efficient and powerful optimization algorithms that can compute the optimal configuration with minimum delay. This paper presents a parallel genetic algorithm on graphics processing unit for distribution feeder reconfiguration. By exploiting the massively parallel architecture of graphics processors, the execution time of the solver is reduced by a factor of 66.2x, resulting in a very fast solver. Moreover, the metaheuristic uses a unique solution encoding based on the minimum spanning tree to maintain the radial structure of the candidate topologies. This novel encoding drastically improves the effectiveness of the genetic algorithm and allows for the optimal reconfiguration of networks up to 4400 buses;five times larger than any of the references surveyed.

关键词： Distribution feeder reconfiguration genetic algorithm graphics processing unit minimum spanning tree parallel programming

来源：评论

学校读者我要写书评

暂无评论

Towards Systematic parallelization of Graph Transformations Over Pregel

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第2期45卷 320-339页

作者： Tung, Le-Duc Hu, Zhenjiang Grad Univ Adv Studies SOKENDAI Hayama Kanagawa 2400193 Japan Natl Inst Informat NII SOKENDAI Chiyoda Ku 2-1-2 Hitotsubashi Tokyo 1018430 Japan

Graphs can be used to model many kinds of data, from traditional datasets to social networks or semi-structured datasets. To process large graphs, many systems have been proposed. The Pregel programming model is popular, thanks to its scalability. Although Pregel is simple to understand and use, it is of low-level in programming and requires developers to write programs that are hard to maintain and need to be carefully optimized. On the other hand, structural recursion is powerful to systematically construct efficient parallel programs on lists, arrays and trees, but it has not yet been applied to graphs. In this paper, we propose an efficient method for parallel evaluation of structural recursion on graphs, which is suitable for Pregel. We design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task. Specifications written in the DSL are automatically compiled into Pregel programs that are scalable for large graphs. Experimental results show that our framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.

关键词： Structural recursion Graph transformation parallel programming Pregel programming model

来源：评论

学校读者我要写书评

暂无评论

Faster GPU-based genetic programming using a two-dimensional stack

引用

SOFT COMPUTING 2017年第14期21卷 3859-3878页

作者： Chitty, Darren M. Univ Bristol Dept Comp Sci Merchant Venturers BldgWoodland Rd Bristol BS8 1UB Avon England

Genetic programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. parallel computing architectures have become commonplace especially with regards to Graphics Processing Units(GPU). Hence, versions of GP have been implemented that utilise these highly parallel computing platforms enabling significant gains in the computational speed of GP to be achieved. However, recently a two-dimensional stack approach to GP using a multi-core CPU also demonstrated considerable performance gains. Indeed, performances equivalent to or exceeding that achieved by a GPU were demonstrated. This paper will demonstrate that a similar two-dimensional stack approach can also be applied to a GPU-based approach to GP to better exploit the underlying technology. Performance gains are achieved over a standard single-dimensional stack approach when utilising a GPU. Overall, a peak computational speed of over 55 billion Genetic programming Operations per Second are observed, a twofold improvement over the best GPU-based single-dimensional stack approach from the literature.

关键词： Genetic programming Many-core GPU parallel programming

来源：评论

学校读者我要写书评

暂无评论

Prototyping a GPGPU Neural Network for Deep-Learning Big Data Analysis

引用

BIG DATA RESEARCH 2017年 8卷 50-56页

作者： Fonseca, Alcides Cabral, Bruno Univ Coimbra Coimbra Portugal

Big Data concerns with large-volume complex growing data. Given the fast development of data storage and network, organizations are collecting large ever-growing datasets that can have useful information. In order to extract information from these datasets within useful time, it is important to use distributed and parallel algorithms. One common usage of big data is machine learning, in which collected data is used to predict future behavior. Deep-Learning using Artificial Neural Networks is one of the popular methods for extracting information from complex datasets. Deep-learning is capable of more creating complex models than traditional probabilistic machine learning techniques. This work presents a step-by-step guide on how to prototype a Deep-Learning application that executes both on GPU and CPU clusters. Python and Redis are the core supporting tools of this guide. This tutorial will allow the reader to understand the basics of building a distributed high performance GPU application in a few hours. Since we do not depend on any deep-learning application or framework-we use low-level building blocks-this tutorial can be adjusted for any other parallel algorithm the reader might want to prototype on Big Data. Finally, we will discuss how to move from a prototype to a fully blown production application. (C) 2017 Elsevier Inc. All rights reserved.

关键词： Big-data Deep-learning Prototyping GPGPU Cluster parallel programming

来源：评论

学校读者我要写书评

暂无评论

A technique to automatically determine Ad-hoc communication patterns at runtime

引用

parallel COMPUTING 2017年 69卷 45-62页

作者： Moreton-Fernandez, Ana Gonzalez-Escribano, Arturo Llanos, Diego R. Univ Valladolid Dept Informat Edif Tecn Informac Campus Miguel Delibes E-47011 Valladolid Spain

Current High Performance Computing (HPC) systems are typically built as interconnected clusters of shared-memory multicore computers. Several techniques to automatically generate parallel programs from high-level parallel languages or sequential codes have been proposed. To properly exploit the scalability of HPC clusters, these techniques should take into account the combination of data communication across distributed memory, and the exploitation of shared-memory models. In this paper, we present a new communication calculation technique to be applied across different SPMD (Single Program Multiple Data) code blocks, containing several uniform data access expressions. We have implemented this technique in Trasgo, a programming model and compilation framework that transforms parallel programs from a high-level parallel specification that deals with parallelism in a unified, abstract, and portable way. The proposed technique computes at runtime exact coarse-grained communications for distributed message-passing processes. Applying this technique at runtime has the advantage of being independent of compile-time decisions, such as the tile size chosen for each process. Our approach allows the automatic generation of pre-compiled multi-level parallel routines, libraries, or programs that can adapt their communication, synchronization, and optimization structures to the target system, even when computing nodes have different capabilities. Our experimental results show that, despite our runtime calculation, our approach can automatically produce efficient programs compared with MPI reference codes, and with codes generated with auto-parallelizing compilers. (C) 2017 Elsevier B.V. All rights reserved.

关键词： SPMD models Distributed communications parallel programming Trasgo

来源：评论

学校读者我要写书评

暂无评论

Resource optimised workflow scheduling in Hadoop using stochastic hill climbing technique

引用

IET SOFTWARE 2017年第5期11卷 239-244页

作者： Rashmi, Shivaswamy Basu, Anirban East Point Coll Engn & Technol Dept Comp Sci & Engn Bangalore Karnataka India APS Coll Engn Dept Comp Sci & Engn Bangalore Karnataka India

Hadoop on datacentre is a popular analytical platform for enterprises. Cloud vendors host Hadoop clusters on the datacentre to provide high performance analytical computing facilities to its customers, who demand a parallel programming model to deal with huge data. Effective cost/time management and ingenious resource consumption among the concurrent users, must be the primary concern without which the key aspiration behind high performance cloud computing would suffer. Workflows portray such high performance applications in terms of individual jobs and dependencies between them. Workflows can be scheduled on virtual machines (VMs) in datacentre to make best possible use of resources. In the authors' earlier work, a mechanism to pack and execute the customer jobs as workflows on Hadoop platform was proposed which minimises the VM cost and also executes the workflow jobs within deadline. In this work, the authors try to optimise certain other parameters such as load on cloud, response time for workflows, resource usage effectiveness by applying soft computing methods. Stochastic hill climbing (SCH) is a soft computing approach used to solve many optimisation problems. In this study, they have employed the SHC approach to schedule workflow jobs to VMs and thereby optimise the above mentioned multiple parameters in cloud datacentre.

关键词： workflow management software stochastic processes data handling scheduling cloud computing parallel programming virtual machines operating systems (computers) resource optimised workflow scheduling stochastic hill climbing technique datacentre cloud vendors Hadoop clusters computing facilities parallel programming model resource consumption concurrent users cloud computing workflows portray virtual machines VM Hadoop platform SCH

来源：评论

学校读者我要写书评

暂无评论

Supporting Enhanced Exception Handling with OpenMP in Object-Oriented Languages

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第6期45卷 1366-1389页

作者： Fan, Xing Mehrabi, Mostafa Sinnen, Oliver Giacaman, Nasser Univ Auckland Dept Elect & Comp Engn Private Bag 92019 Auckland 1142 New Zealand

The proliferation of parallel processing in shared-memory applications has encouraged developing assistant frameworks such as OpenMP. OpenMP has become increasingly prevalent due to the simplicity it offers to elegantly and incrementally introduce parallelism. However, it still lacks some high-level language features that are essential in object-oriented programming. One such mechanism is that of exception handling. In languages such as Java, the concept of exception handling has been an integral aspect to the language since the first release. For OpenMP to be truly embraced within this object-oriented community, essential object-oriented concepts such as exception handling need to be given some attention. The official OpenMP standard has little specification on error recovery, as the challenges of supporting exception-based error recovery in OpenMP extends to both the semantic specifications and related runtime support. This paper proposes a systematic mechanism for exception handling with the co-use of OpenMP directives, which is based on a Java implementation of OpenMP. The concept of exception handling with OpenMP directives has been formalized and categorized. Hand in hand with this exception handling proposal, a flexible approach to thread cancellation is also proposed (as an extension on OpenMP directives) that supports this exception handling within parallel execution. The runtime support and its implementation are discussed. The evaluation shows that while there is no prominent overhead introduced, the new approach provides a more elegant coding style which increases the parallel development efficiency and software robustness.

关键词： OpenMP parallel programming Exception handling Error recovery Software robustness

来源：评论

学校读者我要写书评

暂无评论

A High-Quality and Fast Maximal Independent Set Implementation for GPUs

引用

ACM TRANSACTIONS ON parallel COMPUTING 2018年第2期5卷 8.1-8.27页

作者： Burtscher, Martin Devale, Sindhu Azimi, Sahar Jaiganesh, Jayadharini Powers, Evan Texas State Univ Dept Comp Sci 601 Univ Dr San Marcos TX 78666 USA

Computing a maximal independent set is an important step in many parallel graph algorithms. This article introduces ECL-MIS, a maximal independent set implementation that works well on GPUs. It includes key optimizations to speed up computation, reduce the memory footprint, and increase the set size. Its CUDA implementation requires fewer than 30 kernel statements, runs asynchronously, and produces a deterministic result. It outperforms the maximal independent set implementations of Pannotia, CUSP, and IrGL on each of the 16 tested graphs of various types and sizes. On a Titan X GPU, ECL-MIS is between 3.9 and 100 times faster (11.5 times, on average). ECL-MIS running on the GPU is also faster than the parallel CPU codes Ligra, Ligra+, and PBBS running on 20 Xeon cores, which it outperforms by 4.1 times, on average. At the same time, ECL-MIS produces maximal independent sets that are up to 52% larger (over 10%, on average) compared to these preexisting CPU and GPU implementations. Whereas these codes produce maximal independent sets that are, on average, about 15% smaller than the largest possible such sets, ECL-MIS sets are less than 6% smaller than the maximum independent sets.

关键词： Maximal independent set code optimization GPU parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Power Flow on Graphics Processing Units for Concurrent Evaluation of Many Networks

引用

IEEE TRANSACTIONS ON SMART GRID 2017年第4期8卷 1639-1648页

作者： Roberge, Vincent Tarbouchi, Mohammed Okou, Francis Royal Mil Coll Canada Elect & Comp Engn Dept Kingston ON K7K 7L6 Canada

The power flow (PF) analysis provides the steady state of the power system and is key to the simulation of transmission networks. It is a tool commonly used by system operators to visualize the effect of generator settings on the network prior to making a change. In situations involving large networks, hundreds or even thousands of PF analysis may have to be run on the network before finding the optimal power dispatch. This process requires significant computation time and does not allow for rapid control of the network. To address this problem, this paper presents two parallel PF solvers that exploit the massively parallel architecture of graphics processing units (GPU) in a hybrid GPU-central processing unit (CPU) computing environment using compute unified device architecture and OpenMP in order to significantly speedup the concurrent analysis of many instances of a network. Both implementations use sparse matrices, double precision operations, and enforce the reactive power limit of generators. The parallel Gauss-Seidel (G-S) and Newton-Raphson (N-R) PF algorithms are tested on networks ranging from 4 to 2383 buses. The accuracy is validated using MATPOWER and the maximum speedup achieved, compared with a sequential execution on CPU, is 45.2x for G-S and 17.8x for N-R.

关键词： Power flow analysis parallel programming graphics processing units

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：