检索结果-内蒙古大学图书馆

Application of the method of pyramid for synthesis of parallel algorithm for difference solution of the two-dimensional partial differentials equation

Application of the method of pyramid for synthesis of parall...

引用

2016 International Conference Information Technology and Nanotechnology, ITNT 2016

作者： Golovashkin, D.L. Yablokova, L.V. Belova, E.V. Samara National Research University Samara Russia Image Processing Systems Institute Branch of the Federal Scientific Research Centre Crystallography and Photonics Russian Academy of Sciences Samara Russia

The work is devoted to the synthesis and investigation of parallel algorithm for a finite difference solution of the Poisson equation using the Jacobi method. For example, two-dimensional case demonstrates the efficac... 详细信息

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Extending τ-Lop to model concurrent MPI communications in multicore clusters

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2016年 61卷 66-82页

作者： Rico-Gallego, Juan-Antonio Diaz-Martin, Juan-Carlos Lastovetsky, Alexey L. Univ Extremadura Avd Univ S-N Caceres 10003 Spain Univ Coll Dublin Dublin 4 Ireland

Achieving optimal performance of MPI applications on current multi-core architectures, composed of multiple shared communication channels and deep memory hierarchies, is not trivial. Formal analysis using parallel performance models allows one to depict the underlying behavior of the algorithms and their communication complexities, with the aims of estimating their cost and improving their performance. LogGP model was initially conceived to predict the cost of algorithms in mono-processor clusters based on point-to-point transmissions with network latency and bandwidth based parameters. It remains as the representative model, with multiple extensions for handling high performance networks, covering particular contention cases, channels hierarchies or protocol costs. These very specific branches lead LogGP to partially lose its initial abstract modeling purpose. More recent log(n)P represents a point-to-point transmission as a sequence of implicit transfers or data movements. Nevertheless, similar to LogGP, it models an algorithm in a parallel architecture as a sequence of message transmissions, an approach inefficient to model algorithms more advanced than simple tree based one, as we will show in this work. In this paper, tau-Lop model is extended to multi-core clusters and compared to previous models. It demonstrates the ability to predict the cost of advanced algorithms and mechanisms used by mainstream MPI implementations, such as MPICH or Open MPI, with high accuracy. tau-Lop is based on the concept of concurrent transfers, and applies it to meaningfully represent the behavior of parallel algorithms in complex platforms with hierarchical shared communication channels, taking into account the effects of contention and deployment of processes on the processors. In addition, an exhaustive and reproducible methodology for measuring the parameters of the model is described. (C) 2016 Elsevier B.V. All rights reserved.

关键词： parallel performance models parallel algorithms Message passing interface Performance analysis Multicore clusters

来源：评论

学校读者我要写书评

暂无评论

Creation of Data Mining algorithms as Functional Expression for parallel and Distributed Execution 13th

引用

13th International Conference on parallel Computing Technologies (PaCT)

作者： Kholod, Ivan Petukhov, Ilya St Petersburg Electrotech Univ LETI St Petersburg Russia

ISBN: (纸本)9783319219097;9783319219080

The article describes extension of lambda-calculation for creation of parallel data mining algorithms. The proposed approach uses presentation of the algorithm as a consequence of pure functions with unified interfaces. For parallel execution we use special function that allows to change a structure of the algorithm and to implement various strategies for processing of data set and model.

关键词： parallel algorithms Data mining parallel data mining Distributed data mining Data mining algorithms

来源：评论

学校读者我要写书评

暂无评论

Network-Oblivious algorithms

引用

JOURNAL OF THE ACM 2016年第1期63卷 1–36页

作者： Bilardi, Gianfranco Pietracaprina, Andrea Pucci, Geppino Scquizzato, Michele Silvestri, Francesco Univ Padua Dept Informat Engn I-35131 Padua Italy Univ Padua I-35131 Padua Italy

A framework is proposed for the design and analysis of network-oblivious algorithms, namely algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem's input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in the decomposable bulk synchronous parallel model, which is known to effectively describe a wide and significant class of parallel platforms. The proposed framework can be regarded as an attempt to port the notion of obliviousness, well established in the context of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed.

关键词： algorithms Theory Communication models of computation network locality oblivious algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Parareal in Time for Fast Power System Dynamic Simulations

引用

IEEE TRANSACTIONS ON POWER SYSTEMS 2016年第3期31卷 1820-1830页

作者： Gurrala, Gurunath Dimitrovski, Aleksandar Pannala, Sreekanth Simunovic, Srdjan Starke, Michael Oak Ridge Natl Lab Elect & Elect Syst Res Div Oak Ridge TN 37849 USA Oak Ridge Natl Lab Computat Sci & Math Div Oak Ridge TN 37849 USA

Recent advancements in high-performance parallel computing platforms and parallel algorithms have significantly enhanced the opportunities for real-time power system protection and control. This paper investigates application of Parareal in time algorithm for fast dynamic simulations. Parareal algorithm belongs to the class of temporal decomposition methods which divide the time interval into sub-intervals and solve them concurrently. Time-parallel algorithms face the difficulty of providing correct initial conditions for all the sub-intervals which impact the convergence rates. Parareal overcomes this difficulty by using an approximate trajectory. It has become popular in recent years for long transient simulations (e.g., molecular dynamics, fusion, reacting flows). This paper presents an approach for reliable implementation of Parareal with detailed models of power systems including saturation. Windowing approach is proposed for improving the convergence. Parareal is compared with the Newton-based time-parallel method. Effectiveness of the algorithm is analyzed by parallel emulation using extensive case studies on 10-generator 39-bus system and 327-generator 2383-bus system for various disturbances. Parareal with simulation windows of 1 s have shown convergence in 1 to 3 iterations for majority of the simulated cases, irrespective of the size of the system and nature of the disturbance. All the cases tested have converged with the proposed implementation.

关键词： High-performance computing parallel algorithms Parareal in time power system dynamics transient stability

来源：评论

学校读者我要写书评

暂无评论

On the efficiency of localized work stealing

引用

INFORMATION PROCESSING LETTERS 2016年第2期116卷 100-106页

作者： Suksompong, Warut Leiserson, Charles E. Schardl, Tao B. Stanford Univ Dept Comp Sci Stanford CA 94305 USA MIT Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA

This paper investigates a variant of the work-stealing algorithm that we call the localized work-stealing algorithm. The intuition behind this variant is that because of locality, processors can benefit from working on their own work. Consequently, when a processor is free, it makes a steal attempt to get back its own work. We call this type of steal a steal-back. We show that the expected running time of the algorithm is T-1/P + 0(T infinity P), and that under the "even distribution of free agents assumption", the expected running time of the algorithm is T-1/P + 0(T(infinity)lg P). In addition, we obtain another running-time bound based on ratios between the sizes of serial tasks in the computation. If M denotes the maximum ratio between the largest and the smallest serial tasks of a processor after removing a total of 0(P) serial tasks across all processors from consideration, then the expected running time of the algorithm is T-1/P + 0 (T infinity M). (C) 2015 Elsevier B.V. All rights reserved.

关键词： parallel algorithms Multithreaded computation Work stealing Localization

来源：评论

学校读者我要写书评

暂无评论

Spatial sorting: An efficient strategy for approximate nearest neighbor searching

引用

COMPUTERS & GRAPHICS-UK 2016年 57卷 112-126页

作者： Malheiros, Marcelo de Gomensoro Walter, Marcelo UNIVATES Ctr Exact & Technol Sci Lajeado Brazil Univ Fed Rio Grande do Sul Inst Informat Porto Alegre RS Brazil

Many graphics and also non-graphics applications need efficient techniques to find the nearest neighbors of a given query point. There are two approaches to address this problem: space-partitioning and data partitioning. We present a data-partitioning error-controlled strategy for solving the nearest neighbor search (NNS) problem using spatial sorting as the basic building block. We improve on the neighborhood grid method by doing an extensive study on novel spatial sorting strategies for bidimensional NNS, providing significant performance and precision gains over previous works. Experiments demonstrate that, for many dense 2D point distributions, our solution is competitive with more complex and traditional techniques, such as k-d trees and index sorting. We also show comparable results for the 3D case. Our primary contribution is a dynamic, simple to implement, memory efficient, and highly parallelizable solution for low-dimensional approximate nearest neighbor search. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： Spatial sorting k-nearest neighbors parallel algorithms Data structures

来源：评论

学校读者我要写书评

暂无评论

A polyphase filter for many-core architectures

引用

ASTRONOMY AND COMPUTING 2016年 16卷 1-16页

作者： Adamek, K. Novotny, J. Armour, W. Silesian Univ Opava Fac Philosophy & Sci Inst Phys Bezrucovo Nam 13 Opava 74601 Czech Republic Univ Oxford Oxford Res Ctr E 7 Keble Rd Oxford OX1 3QG England

In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFLOP/s/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.5 x to 1.92 x greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data. (C) 2016 Els

关键词： Graphics processors parallel architectures parallel programming languages parallel computing models parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Simulation analysis of coverage blind spots detection in sensor networks

EEA - Electrotehnica, Electronica, Automatica

引用

EEA - Electrotehnica, Electronica, Automatica 2017年第2期65卷 147-152页

作者： Yang, Hui Lu, Chuiwei Zhang, Guojun Sun, Sheng Computer School Hubei Polytechnic University HuangshiHubei China

In the research process of the coverage blind spots detection methods in wireless sensor networks, when the current methods are used for detection, the calculation burden is heavy, and a wide range of wireless sensor network cannot be detected. To solve the above problems, a coverage blind spots detection method in wireless sensor networks based on an improved distributed algorithm is proposed in this paper. The method forms a perceptual model of wireless sensor networks based on sensor nodes of different physical characteristics, the perceptual area intensity of wireless sensor networks is defined, all the boundary nodes of the coverage blind spots are identified in wireless sensor networks, the distributed algorithm is merged with the geometric theory, the boundary arcs and boundary endpoints of the coverage blind spots in wireless sensor networks are computed, the coverage blind spots detection scheme in wireless sensor networks based on the improved distributed algorithm is implemented. Simulation experiments have proved that the coverage blind spots detection method in wireless sensor networks based on the improved distributed algorithm can effectively increase the detection accuracy of blind spots of area coverage. © 2017, ICPE Electra Publishing House. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

On the benefit of merging suffix array intervals for parallel pattern matching 27

On the benefit of merging suffix array intervals for paralle...

引用

27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016

作者： Fischer, Johannes Köppl, Dominik Kurpicz, Florian Dept. of Computer Science Technische Universität Dortmund Germany

ISBN: (纸本)9783959770125

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length (n, we first show how to) compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p·lg lg n) time for p ≤ m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O(mkσk/p max (k, lg lg n) + (1 + m/p)lg p·lg lg n + occ) time, where σ is the size of the alphabet and p ≤ σkmk. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P′, we present a data structure for computing the interval of P P′ in O(lg lg n) sequential time, or in O(1 + lgp lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array). © Johannes Fischer, Dominik Köppl, and Florian Kurpicz.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：