检索结果-内蒙古大学图书馆

CL_ARRAY: A new generic library of multidimensional containers for c plus plus compilers with extension for OpenCL framework

引用

COMPUTER LANGUAGES SYSTEMS & STRUCTURES 2017年第Dec.期50卷 53-81页

作者： Zouaoui, Chakib Mustapha Anouar Taleb, Nasreddine Univ Djillali Liabes Sidi Bel Abbes Dept Elect RCAM Lab Sidi Bel Abbes Algeria

This paper presents a new metaprogramming library, CL_ARRAY, that offers multiplatform and generic multidimensional data containers for C++ specifically adapted for parallel programming. The CL_ARRAY containers are built around a new formalism for representing the multidimensional nature of data as well as the semantics of multidimensional pointers and contiguous data structures. We also present OCL_ARRAY VIEW, a concept based on metaprogrammed enveloped objects that supports multidimensional transformations and multidimensional iterators designed to simplify and formalize the interfacing process between OpenCL APIs, standard template library (STL) algorithms and CL_ARRAY containers. Our results demonstrate improved performance and energy savings over the three most popular container libraries available to the developer community for use in the context of multi -linear algebraic applications. (C) 2017 Elsevier Ltd. All rights reserved.

关键词： C plus plus multidimensional data container Metaprogramming parallel programming

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

引用

JOURNAL OF SUPERCOMPUTING 2017年第12期73卷 5378-5401页

作者： Jarzabek, Lukasz Czarnul, Pawel Gdansk Univ Technol Fac Elect Telecommun & Informat Gdansk Poland

The aim of this paper is to evaluate performance of new CUDA mechanisms-unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically, tested applications include verification of Goldbach's conjecture, 2D heat transfer simulation and adaptive numerical integration. We experimented with various ways of how dynamic parallelism can be deployed into an existing implementation and be optimized further. Subsequently, we compared the best dynamic parallelism and unified memory versions to respective standard API counterparts. It was shown that usage of dynamic parallelism resulted in improvement in performance for heat simulation, better than static but worse than an iterative version for numerical integration and finally worse results for Golbach's conjecture verification. In most cases, unified memory results in decrease in performance. On the other hand, both mechanisms can contribute to simpler and more readable codes. For dynamic parallelism, it applies to algorithms in which it can be naturally applied. Unified memory generally makes it easier for a programmer to enter the CUDA programming paradigm as it resembles the traditional memory allocation/usage pattern.

关键词： CUDA Dynamic parallelism Unified memory parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Transient Stability Simulation Based on Multi-Area Thevenin Equivalents

引用

IEEE TRANSACTIONS ON SMART GRID 2017年第3期8卷 1366-1377页

作者： Tomim, Marcelo A. Marti, Jose R. Passos Filho, Joao A. Univ Fed Juiz de Fora Dept Elect Engn BR-36036330 Juiz De Fora MG Brazil Univ British Columbia Dept Elect & Comp Engn Vancouver BC V6T 1Z4 Canada

In this paper the diakoptics-based branch-tearing method for solving large electric networks, known as the Multi-Area Thevenin Equivalents (MATE), is combined with the alternating method for parallelizing the transient stability simulations of bulk power systems. In the proposed framework, equations associated with dynamic and static devices and passive network are distributed among computing processes. The paper discusses an implementation of the parallel transient stability simulator along with results for two power systems with about 4,000 and 15,000 buses, both based on the Brazilian Interconnected Power System. Performance metrics for assessing the effectiveness of the proposed methodology are also presented and discussed. In order to validate the implementation, the results are also compared with those from an industrial-grade transient stability program, ANATEM developed by CEPEL.

关键词： Power system transient stability parallel programming diakoptics Thevenin equivalents

来源：评论

学校读者我要写书评

暂无评论

A GPU-Accelerated Fourth-Order Runge-Kutta in the Interaction Picture Method for the Simulation of Nonlinear Signal Propagation in Multimode Fibers

引用

JOURNAL OF LIGHTWAVE TECHNOLOGY 2017年第17期35卷 3622-3628页

作者： Brehler, Marius Schirwon, Malte Goeddeke, Dominik Krummrich, Peter M. Tech Univ Dortmund D-44227 Dortmund Germany Univ Stuttgart Inst Appl Anal & Numer Simulat D-70569 Stuttgart Germany

The nonlinear signal propagation in fibers can be described by the nonlinear Schrodinger equation and the Manakov equation. Most commonly, split-step Fourier methods (SSFM) are applied to solve these nonlinear equations. The numerical simulation of the nonlinear signal propagation is especially challenging for multimode fibers, particularly if the calculation of very small step sizes or a large number of steps is required. Instead of utilizing SSFM, the fourth-order Runge-Kutta in the Interaction Picture (RK4IP) method can be applied. This method has the potential to reduce the numerical error while simultaneously allowing an increased step size. These advantages come at the price of a higher numerical effort compared to the SSFM method for the same step size. Since the simulation of the signal propagation in multimode fibers is already quite challenging, parallelization becomes an even more interesting option. We demonstrate the adaptation of the RK4IP method to simulate the nonlinear signal propagation in multimode fibers, including its parallelization. Besides comparing the performance of a parallelized implementation for multicore CPUs and a GPU-accelerated version, we discuss efficient strategies to implement the RK4IP method on a GPU accelerator with CUDA. In addition, the RK4IP implementation is numerically compared with a conventional SSFM implementation.

关键词： Graphics processing units interaction picture multimode fibers nonlinear fiber optics optical fiber communication parallel programming space-division multiplexing

来源：评论

学校读者我要写书评

暂无评论

FMM/GPU-Accelerated Boundary Element Method for Computational Magnetics and Electrostatics

引用

IEEE TRANSACTIONS ON MAGNETICS 2017年第12期53卷 1-11页

作者： Adelman, Ross Gumerov, Nail A. Duraiswami, Ramani Army Res Lab Adelphi MD 20783 USA Univ Maryland Inst Adv Comp Studies College Pk MD 20742 USA Fantaglo LLC Elkridge MD 21075 USA Univ Maryland Dept Comp Sci College Pk MD 20742 USA

A fast multipole method (FMM)/graphics processing unit-accelerated boundary element method (BEM) for computational magnetics and electrostatics via the Laplace equation is presented. The BEM is an integral method, but the FMM is typically designed around monopole and dipole sources. To apply the FMM to the integral expressions in the BEM, the internal data structures and logic of the FMM must be changed. However, this can be difficult. For example, computing the multipole expansions due to the boundary elements requires computing single and double surface integrals over them. Moreover, FMM codes for monopole and dipole sources are widely available and highly optimized. This paper describes a method for applying the FMM unchanged to the integral expressions in the BEM. This method, called the correction factor matrix method, works by approximating the integrals using a quadrature. The quadrature points are treated as monopole and dipole sources, which can be plugged directly into current FMM codes. The FMM is effectively treated as a black box. Inaccuracies from the quadrature are corrected during a correction factor step. The method is derived, and example problems are presented showing accuracy and performance.

关键词： Boundary element method boundary integral equations fast solvers Galerkin method integral equations Laplace equation method of moments parallel processing parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Wait-Free Hash Map

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第3期45卷 421-448页

作者： Laborde, Pierre Feldman, Steven Dechev, Damian Univ Cent Florida Orlando FL 32816 USA

In this work we present the first design and implementation of a wait-free hash map. Our multiprocessor data structure allows a large number of threads to concurrently insert, get, and remove information. Wait-freedom means that all threads make progress in a finite amount of time-an attribute that can be critical in real-time environments. This is opposed to the traditional blocking implementations of shared data structures which suffer from the negative impact of deadlock and related correctness and performance issues. We only use atomic operations that are provided by the hardware;therefore, our hash map can be utilized by a variety of data-intensive applications including those within the domains of embedded systems and supercomputers. The challenges of providing this guarantee make the design and implementation of wait-free objects difficult. As such, there are few wait-free data structures described in the literature;in particular, there are no wait-free hash maps. It often becomes necessary to sacrifice performance in order to achieve wait-freedom. However, our experimental evaluation shows that our hash map design is, on average, 7 times faster than a traditional blocking design. Our solution outperforms the best available alternative non-blocking designs in a large majority of cases, typically by a factor of 15 or higher.

关键词： Lock-free Wait-free Non-blocking Hash map Data Structures parallel programming Concurrency

来源：评论

学校读者我要写书评

暂无评论

Calculating parallel Programs in Coq Using List Homomorphisms

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第2期45卷 300-319页

作者： Loulergue, Frederic Bousdira, Wadoud Tesson, Julien Univ Paris Diderot CNRS PPS Inria R2 Paris France Univ Orleans INSA Ctr Val Loire LIFO EA 4022 Orleans France Univ Paris Est UPEC LACL F-94010 Creteil France

SyDPaCC is a set of libraries for the Coq proof assistant. It allows to write naive functional programs (i.e. with high complexity) that are considered as specifications, and to transform them into more efficient versions. These more efficient versions can then be automatically parallelised before being extracted from Coq into source code for the functional language OCaml together with calls to the Bulk Synchronous parallel ML library. In this paper we present a new core version of SyDPaCC for the development of parallel programs correct-by-construction using the theory of list homomorphisms and algorithmic skeletons implemented and verified in Coq. The framework is illustrated on the maximum prefix sum problem.

关键词： parallel programming Algorithmic skeletons Constructive algorithms Proof assistant

来源：评论

学校读者我要写书评

暂无评论

parallel SuperFine-A tool for fast and accurate supertree estimation: Features and limitations

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2017年 67卷 441-454页

作者： Neves, Diogo Telmo Sobral, Joao Luis Univ Minho Dept Informat Campus Gualtar Braga Portugal

Computing evolutionary relationships on data sets containing hundreds to thousands of taxa easily becomes a daunting task. With recent advances in next-generation sequencing technologies, biological data sets are growing at an unprecedented pace. This fact turns much harder, either in terms of complexity or scale, to conduct analyses over such large data sets. Therefore, phylogenetics requires new algorithms, methods, and tools to take advantage of parallel hardware and to be able to handle the unprecedented growth of biological data. In this paper, we present parallel SuperFine - a tool for fast and accurate supertree estimation- and its features. parallel SuperFine was derived from SuperFine a state-of-the-art supertree (meta)method. We describe an extension made to SuperFine, which allows to improve significantly its performance, and how the EPIC framework is used to boost the overall performance of parallel SuperFine. Additionally, we pinpoint current limitations that impair to attain (even) a better performance. Our studies reveal that parallel SuperFine allows to reduce, significantly, the time required to perform supertree estimation. Moreover, we show that parallel SuperFine exhibits good scalability, even in the presence of asymmetric biological data sets. Furthermore, the achieved results enable to conclude that the radical improvement in performance does not impair tree accuracy, which is a key issue in phylogenetic inference. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Phylogenetics Supertree estimation Irregular application Third-party tools Cluster parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimising loops in dynamic dataflow

引用

IET CIRCUITS DEVICES & SYSTEMS 2017年第2期11卷 113-122页

作者： Santiago, Leandro Marzulo, Leandro A. J. Sena, Alexandre C. Alves, Tiago A. O. Franca, Felipe M. G. Univ Fed Rio de Janeiro PESC COPPE Programa Engn Sistemas & Comp Rio De Janeiro Brazil Univ Estado Rio de Janeiro IME Rio De Janeiro Brazil

Dynamic dataflow allows simultaneous execution of instructions in different iterations of a loop, boosting parallelism exploitation. In this model, operands are tagged with their associated instance number, which is incremented as they go through the loop. Instruction execution is triggered when all input operands with the same tag become available. However, this traditional tagging mechanism often requires the generation of several control instructions to manipulate tags and guarantee the correct match. To address this problem, this work presents three dataflow loop optimisation techniques. The stack-tagged dataflow is a tagging mechanism that uses stacks of tags to reduce control overheads in dataflow. On the other hand, as nested loops may increase the overhead of stack-tag comparison, tag resetting can be used to set the tag to zero whenever it is safe, allowing a one-level reduction at the stack depth. Finally, loop skipping allows to further avoid stack comparison overhead in loops, when the number of iterations can be determined by the compiler. Experimental results show the overhead, drawbacks and benefits for the three optimisations presented. Moreover, the results suggested that a hybrid compiling approach can be used to get the best performance of each technique.

关键词： dynamic dataflow tagging mechanism dataflow overhead reduction data flow graphs data handling stack tagged dataflow nested loops dataflow loop optimisation parallel programming

来源：评论

学校读者我要写书评

暂无评论

Multi-ML: programming Multi-BSP Algorithms in ML

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第2期45卷 340-361页

作者： Allombert, V. Gava, F. Tesson, J. Univ Paris Est UPEC LACL EA 4219 F-94010 Creteil France

bsp is a bridging model between abstract execution and concrete parallel systems. Structure and abstraction brought by bsp allow to have portable parallel programs with scalable performance predictions, without dealing with low-level details of architectures. In the past, we designed bsml for programming bsp algorithms in ml. However, the simplicity of the bsp model does not fit the complexity of today's hierarchical architectures such as clusters of machines with multiple multi-core processors. The multi-bsp model is an extension of the bsp model which brings a tree-based view of nested components of hierarchical architectures. To program multi-bsp algorithms in ml, we propose the multi-ml language as an extension of bsml where a specific kind of recursion is used to go through a hierarchy of computing nodes. We define a formal semantics of the language and present preliminary experiments which show performance improvements with respect to bsml.

关键词： BSP MULTI-BSP ML parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：