检索结果-内蒙古大学图书馆

parallel Transient Stability Simulation Based on Multi-Area Thevenin Equivalents

IEEE TRANSACTIONS ON SMART GRID 2017年第3期8卷 1366-1377页

作者： Tomim, Marcelo A. Marti, Jose R. Passos Filho, Joao A. Univ Fed Juiz de Fora Dept Elect Engn BR-36036330 Juiz De Fora MG Brazil Univ British Columbia Dept Elect & Comp Engn Vancouver BC V6T 1Z4 Canada

In this paper the diakoptics-based branch-tearing method for solving large electric networks, known as the Multi-Area Thevenin Equivalents (MATE), is combined with the alternating method for parallelizing the transient stability simulations of bulk power systems. In the proposed framework, equations associated with dynamic and static devices and passive network are distributed among computing processes. The paper discusses an implementation of the parallel transient stability simulator along with results for two power systems with about 4,000 and 15,000 buses, both based on the Brazilian Interconnected Power System. Performance metrics for assessing the effectiveness of the proposed methodology are also presented and discussed. In order to validate the implementation, the results are also compared with those from an industrial-grade transient stability program, ANATEM developed by CEPEL.

关键词： Power system transient stability parallel programming diakoptics Thevenin equivalents

来源：评论

学校读者我要写书评

暂无评论

Dynamic Algorithm Switching in parallel Simulations using AOP

引用

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 2018年第6期34卷 1367-1382页

作者： Kang, Pilsung Youngsan Univ Dept Comp Engn Yangsan 50510 Gyeongnam South Korea

We present a modular approach to implementing dynamic algorithm switching for parallel scientific simulations. Our approach leverages modem software engineering techniques to implement fine-grained control of algorithmic behavior in scientific simulations as well as to improve modularity in realizing the algorithm switching functionality onto existing application source code. Through fine-grained control of functional behavior in an application, our approach enables design and implementation of application specific dynamic algorithm switching scenarios. To ensure modularity, our approach considers dynamic algorithm switching as a separate concern with regard to a given application and encourages separate development and transparent integration of the switching functionality without directly modifying the original application code. By applying and evaluating our approach with a real-world scientific application to switch its simulation algorithms dynamically, we demonstrate the applicability and effectiveness of our approach to constructing efficient parallel simulations.

关键词： algorithm switching aspect-oriented programming parallel programming program adaptation scientific computing

来源：评论

学校读者我要写书评

暂无评论

Graphing trillions of triangles

引用

INFORMATION VISUALIZATION 2017年第3期16卷 157-166页

作者： Burkhardt, Paul US Natl Secur Agcy 9800 Savage Rd Ft George G Meade MD 20755 USA

The increasing size of Big Data is often heralded but how data are transformed and represented is also profoundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has been placed on the scale of the input graph but the product of a graph algorithm can be many times larger than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scalable graph exploration for Big Graphs requires new approaches to algorithms, architectures, and visual analytics. A brief tutorial is given to aid the argument for thoughtful representation of data in the context of graph analysis. Then a new algebraic method to reduce the arithmetic operations in counting and listing triangles in graphs is introduced. Additionally, a scalable triangle listing algorithm in the MapReduce model will be presented followed by a description of the experiments with that algorithm that led to the current largest and fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph exploration technologies is proposed.

关键词： Graph scalable algorithms triangle counting visual analytics parallel programming MapReduce

来源：评论

学校读者我要写书评

暂无评论

CL_ARRAY: A new generic library of multidimensional containers for c plus plus compilers with extension for OpenCL framework

引用

COMPUTER LANGUAGES SYSTEMS & STRUCTURES 2017年 50卷 53-81页

作者： Zouaoui, Chakib Mustapha Anouar Taleb, Nasreddine Univ Djillali Liabes Sidi Bel Abbes Dept Elect RCAM Lab Sidi Bel Abbes Algeria

This paper presents a new metaprogramming library, CL_ARRAY, that offers multiplatform and generic multidimensional data containers for C++ specifically adapted for parallel programming. The CL_ARRAY containers are built around a new formalism for representing the multidimensional nature of data as well as the semantics of multidimensional pointers and contiguous data structures. We also present OCL_ARRAY VIEW, a concept based on metaprogrammed enveloped objects that supports multidimensional transformations and multidimensional iterators designed to simplify and formalize the interfacing process between OpenCL APIs, standard template library (STL) algorithms and CL_ARRAY containers. Our results demonstrate improved performance and energy savings over the three most popular container libraries available to the developer community for use in the context of multi -linear algebraic applications. (C) 2017 Elsevier Ltd. All rights reserved.

关键词： C plus plus multidimensional data container Metaprogramming parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel SuperFine-A tool for fast and accurate supertree estimation: Features and limitations

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2017年 67卷 441-454页

作者： Neves, Diogo Telmo Sobral, Joao Luis Univ Minho Dept Informat Campus Gualtar Braga Portugal

Computing evolutionary relationships on data sets containing hundreds to thousands of taxa easily becomes a daunting task. With recent advances in next-generation sequencing technologies, biological data sets are growing at an unprecedented pace. This fact turns much harder, either in terms of complexity or scale, to conduct analyses over such large data sets. Therefore, phylogenetics requires new algorithms, methods, and tools to take advantage of parallel hardware and to be able to handle the unprecedented growth of biological data. In this paper, we present parallel SuperFine - a tool for fast and accurate supertree estimation- and its features. parallel SuperFine was derived from SuperFine a state-of-the-art supertree (meta)method. We describe an extension made to SuperFine, which allows to improve significantly its performance, and how the EPIC framework is used to boost the overall performance of parallel SuperFine. Additionally, we pinpoint current limitations that impair to attain (even) a better performance. Our studies reveal that parallel SuperFine allows to reduce, significantly, the time required to perform supertree estimation. Moreover, we show that parallel SuperFine exhibits good scalability, even in the presence of asymmetric biological data sets. Furthermore, the achieved results enable to conclude that the radical improvement in performance does not impair tree accuracy, which is a key issue in phylogenetic inference. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Phylogenetics Supertree estimation Irregular application Third-party tools Cluster parallel programming

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

引用

JOURNAL OF SUPERCOMPUTING 2017年第12期73卷 5378-5401页

作者： Jarzabek, Lukasz Czarnul, Pawel Gdansk Univ Technol Fac Elect Telecommun & Informat Gdansk Poland

The aim of this paper is to evaluate performance of new CUDA mechanisms-unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically, tested applications include verification of Goldbach's conjecture, 2D heat transfer simulation and adaptive numerical integration. We experimented with various ways of how dynamic parallelism can be deployed into an existing implementation and be optimized further. Subsequently, we compared the best dynamic parallelism and unified memory versions to respective standard API counterparts. It was shown that usage of dynamic parallelism resulted in improvement in performance for heat simulation, better than static but worse than an iterative version for numerical integration and finally worse results for Golbach's conjecture verification. In most cases, unified memory results in decrease in performance. On the other hand, both mechanisms can contribute to simpler and more readable codes. For dynamic parallelism, it applies to algorithms in which it can be naturally applied. Unified memory generally makes it easier for a programmer to enter the CUDA programming paradigm as it resembles the traditional memory allocation/usage pattern.

关键词： CUDA Dynamic parallelism Unified memory parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Wait-Free Hash Map

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第3期45卷 421-448页

作者： Laborde, Pierre Feldman, Steven Dechev, Damian Univ Cent Florida Orlando FL 32816 USA

In this work we present the first design and implementation of a wait-free hash map. Our multiprocessor data structure allows a large number of threads to concurrently insert, get, and remove information. Wait-freedom means that all threads make progress in a finite amount of time-an attribute that can be critical in real-time environments. This is opposed to the traditional blocking implementations of shared data structures which suffer from the negative impact of deadlock and related correctness and performance issues. We only use atomic operations that are provided by the hardware;therefore, our hash map can be utilized by a variety of data-intensive applications including those within the domains of embedded systems and supercomputers. The challenges of providing this guarantee make the design and implementation of wait-free objects difficult. As such, there are few wait-free data structures described in the literature;in particular, there are no wait-free hash maps. It often becomes necessary to sacrifice performance in order to achieve wait-freedom. However, our experimental evaluation shows that our hash map design is, on average, 7 times faster than a traditional blocking design. Our solution outperforms the best available alternative non-blocking designs in a large majority of cases, typically by a factor of 15 or higher.

关键词： Lock-free Wait-free Non-blocking Hash map Data Structures parallel programming Concurrency

来源：评论

学校读者我要写书评

暂无评论

A GPU-Based Processing Chain for Linearly Unmixing Hyperspectral Images

引用

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2017年第3期10卷 818-834页

作者： Martel, Ernestina Guerra, Raul Lopez, Sebastian Sarmiento, Roberto Univ Las Palmas Gran Canaria Inst Appl Microelect Las Palmas Gran Canaria 35003 Spain

Linear spectral unmixing is one of the nowadays hottest research topics within the hyperspectral imaging community, being a proof of this fact the vast amount of papers that can be found in the scientific literature about this challenging task. A subset of these works is devoted to the acceleration of previously published unmixing algorithms for application under tight time constraints. For this purpose, hyperspectral unmixing algorithms are typically implemented onto high-performance computing architectures in which the operations involved are executed in parallel, which conducts to a reduction in the time required for unmixing a given hyperspectral image with respect to the sequential version of these algorithms. The speedup factors that can be achieved by means of these high-performance computing platforms heavily depend on the inherent level of parallelism of the algorithms to be executed onto them. However, the majority of the state-of-the-art unmixing algorithms were not originally conceived for being parallelized in an ulterior stage, which clearly restricts the amount of acceleration that can be reached. As far as advanced hyperspectral sensors have increasingly high spatial, spectral, and temporal resolutions, it is hence mandatory to follow a new approach that consists of developing a new class of highly parallel unmixing solutions that can take full advantage of the characteristics of nowadays high-performance computing architectures. This paper represents a step forward toward this direction as it proposes a new parallel algorithm for fully unmixing a hyperspectral image together with its implementation onto two different NVIDIA graphic processing units (GPUs). The results obtained reveal that our proposal is able to unmix hyperspectral images with very different spatial patterns and size better and much faster than the best GPU-based unmixing chains up-to-date published, with independence of the characteristics of the selected GPU.

关键词： Compute unified device architecture (CUDA) graphic processing unit (GPU) high-performance computing hyperspectral unmixing parallel programming

来源：评论

学校读者我要写书评

暂无评论

A GPU-Accelerated Fourth-Order Runge-Kutta in the Interaction Picture Method for the Simulation of Nonlinear Signal Propagation in Multimode Fibers

引用

JOURNAL OF LIGHTWAVE TECHNOLOGY 2017年第17期35卷 3622-3628页

作者： Brehler, Marius Schirwon, Malte Goeddeke, Dominik Krummrich, Peter M. Tech Univ Dortmund D-44227 Dortmund Germany Univ Stuttgart Inst Appl Anal & Numer Simulat D-70569 Stuttgart Germany

The nonlinear signal propagation in fibers can be described by the nonlinear Schrodinger equation and the Manakov equation. Most commonly, split-step Fourier methods (SSFM) are applied to solve these nonlinear equations. The numerical simulation of the nonlinear signal propagation is especially challenging for multimode fibers, particularly if the calculation of very small step sizes or a large number of steps is required. Instead of utilizing SSFM, the fourth-order Runge-Kutta in the Interaction Picture (RK4IP) method can be applied. This method has the potential to reduce the numerical error while simultaneously allowing an increased step size. These advantages come at the price of a higher numerical effort compared to the SSFM method for the same step size. Since the simulation of the signal propagation in multimode fibers is already quite challenging, parallelization becomes an even more interesting option. We demonstrate the adaptation of the RK4IP method to simulate the nonlinear signal propagation in multimode fibers, including its parallelization. Besides comparing the performance of a parallelized implementation for multicore CPUs and a GPU-accelerated version, we discuss efficient strategies to implement the RK4IP method on a GPU accelerator with CUDA. In addition, the RK4IP implementation is numerically compared with a conventional SSFM implementation.

关键词： Graphics processing units interaction picture multimode fibers nonlinear fiber optics optical fiber communication parallel programming space-division multiplexing

来源：评论

学校读者我要写书评

暂无评论

Multi-ML: programming Multi-BSP Algorithms in ML

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第2期45卷 340-361页

作者： Allombert, V. Gava, F. Tesson, J. Univ Paris Est UPEC LACL EA 4219 F-94010 Creteil France

bsp is a bridging model between abstract execution and concrete parallel systems. Structure and abstraction brought by bsp allow to have portable parallel programs with scalable performance predictions, without dealing with low-level details of architectures. In the past, we designed bsml for programming bsp algorithms in ml. However, the simplicity of the bsp model does not fit the complexity of today's hierarchical architectures such as clusters of machines with multiple multi-core processors. The multi-bsp model is an extension of the bsp model which brings a tree-based view of nested components of hierarchical architectures. To program multi-bsp algorithms in ml, we propose the multi-ml language as an extension of bsml where a specific kind of recursion is used to go through a hierarchy of computing nodes. We define a formal semantics of the language and present preliminary experiments which show performance improvements with respect to bsml.

关键词： BSP MULTI-BSP ML parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：