检索结果-内蒙古大学图书馆

Graphics-Processing-Unit-Based Acceleration of Electromagnetic Transients Simulation

IEEE TRANSACTIONS ON POWER DELIVERY 2016年第5期31卷 2036-2044页

作者： Debnath, Jayanta K. Gole, Aniruddha M. Fung, Wai-Keung Univ Manitoba Winnipeg MB R3T 5V6 Canada Robert Gordon Univ Aberdeen AB10 7GJ Scotland

This paper presents a novel approach to speed up electromagnetic-transients (EMT) simulation, using graphics-processing-unit (GPU)-based computing. This paper extends earlier published works in the area, by exploiting additional parallelism inside EMT simulation. A 2D-parallel matrix-vector multiplication is used that is faster than previous 1D-methods. Also, this paper implements a GPU-specific sparsity technique to further speed up the simulations, as the available CPU-based sparsity techniques are not suitable for GPUs. In addition, as an extension to previous works, this paper demonstrates modelling a power-electronic subsystem. The efficacy of the approach is demonstrated using two different scalable test systems. A low granularity system, that is, one with a large cluster of buses connected to others with a few transmission lines is considered, as is also a high granularity where a small cluster of buses is connected to other clusters, thereby requiring more interconnecting transmission lines. Computation times for GPU-based computing are compared with the computation times for sequential implementations on the CPU. This paper shows two surprising differences of GPU simulation in comparison with CPU simulation. First, the inclusion of sparsity only makes minor reductions in the GPU-based simulation time. Second, excessive granularity, even though it appears to increase the number of parallel-computable subsystems, significantly slows down the GPU-based simulation.

关键词： CUDA-C programming electromagnetic-transients (EMT) simulation graphics-processing unit (GPU) computing parallel algorithms power system modelling power systems simulation

来源：评论

学校读者我要写书评

暂无评论

Comparison of parallel algorithms for modelling mass-springs systems with several APIs on modern GPUs 11

Comparison of parallel algorithms for modelling mass-springs...

引用

Proceedings of the 12th International Conference on Computer Systems and Technologies

作者： Vassilev, Tzvetomir Ivanov Department of Informatics and Information Technologies University of Ruse Bulgaria

ISBN: (纸本)9781450309172

The paper proposes and compares two parallel algorithms for GPU simulation of a mass-spring cloth model and image based collision detection and response approach. The algorithms are implemented using three different APIs for GPU programming: OpenGL plus GLSL, NVIDIA CUDA and OpenCL. The speed of the two algorithms is measured on each of the APIs and results are presented. Conclusions are drawn at the end of the paper. © 2011 ACM.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithm for Finding All Pairs κ-Mismatch Maximal Common Substrings

A Parallel Algorithm for Finding All Pairs κ-Mismatch Maxim...

引用

2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016

作者： Chockalingam, Sriram P. Thankachan, Sharma V. Aluru, Srinivas Dept. of Computer Science and Engineering Indian Institute of Technology Mumbai India School of Computational Science and Engineering Georgia Institute of Technology Atlanta United States

ISBN: (纸本)9781467388153

We present an efficient parallel algorithm for the following problem: Given an input collection D of n sequences of total length N, a length threshold f and a mismatch threshold κ, report all κ-mismatch maximal common substrings of length at least f over all pairs of strings in D. This problem is motivated by clustering and assembly applications in computational biology, where D is a collection of millions of short DNA sequences. Sequencing errors and massive size of these datasets necessitate efficient parallel approximate sequence matching algorithms. We present a novel distributed memory parallel algorithm that solves this approximate sequence matching problem in O ((N/p log N + occ)logk N) expected time and takes only O(logk+1 N) expected rounds of global communications, under some realistic assumptions, where p is the number of processors and occ is the output size. To our knowledge, this is the first provably sub-quadratic time algorithm for solving this problem. We demonstrate the performance and scalability of our algorithm using large high throughput sequencing data sets. © 2016 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for determining the communication radius of an automatic light trap based on balltree structure 8

A parallel algorithm for determining the communication radiu...

引用

8th International Conference on Knowledge and Systems Engineering, KSE 2016

作者： Phuong, Giang Nguyen Thi Luong, Huong Hoang Pham, Tai Huu Huynh, Hiep Xuan Industrial University of Ho Chi Minh City Ho Chi Minh city Viet Nam CUSC-Cantho University Viet Nam CICT-Cantho University DREAM-CTU/IRD Viet Nam

ISBN: (纸本)9781467389297

Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an automatic BPH light trap surveillance network can eectively communicate. In this paper, we propose a new approach to determine the communication radius of an automatic light trap based on balltree structure. This approach will propose a parallel algorithm for implementing the balltree structure (CudaBalltree) and determining the communication radius of an automatic light trap by using CUDA NVIDA platform. © 2016 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2016年第12期61卷 3740-3754页

作者： Feyzmahdavian, Hamid Reza Aytekin, Arda Johansson, Mikael Royal Inst Technol KTH Dept Automat Control Sch Elect Engn SE-10044 Stockholm Sweden Royal Inst Technol KTH ACCESS Linnaeus Ctr SE-10044 Stockholm Sweden

Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state-of-the-art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order O(1/root T) for general convex regularization functions, and the rate O(1/T) for strongly convex regularization functions, where T is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speed-up in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.

关键词： Delay systems minimization methods optimization methods parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Communication in parallel algorithms for constraint-based local search

Communication in parallel algorithms for constraint-based lo...

引用

25th IEEE International parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011

作者： Caniou, Yves Codognet, Philippe JFLI CNRS NII Tokyo Japan JFLI CNRS / UPMC University of Tokyo Tokyo Japan

ISBN: (纸本)9780769543857

We address the issue of parallelizing constraint solvers based on local search methods for massively parallel architectures, involving several thousands of CPUs. We present a family of a constraint-based local search algorithms and investigate their performance results on hardwares with several hundreds of processors. The first method is a basic independent multiple-walk algorithm: each processor runs a local search starting from a distinct initial configuration and the first one which will reach a solution will notify the others and stop all computations. These simple methods have good performances, and good speedups can be achieved up to a few hundreds of processors. Then we consider 2 versions with communication between processors: 1) every c iterations, each processor sends the current value (cost) of its configuration to others and a processor who received a better cost from another processor can decide to stop its current search with a probability p;2) the number of iterations corresponding to the cost is also transfered. Both the received cost and the number of iterations have to be better for a processor to decide to draw a probability and restart. Several experiments involving more than 100 processors have been conducted and different values of p have been tried to consider more or less "autistic" processors. However results show that it is very difficult to achieve better performance than the initial method without communication. © 2011 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Approximating the solution to mixed packing and covering LPs in parallel Õ(ϵ-3) time 43

Approximating the solution to mixed packing and covering LPs...

引用

43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016

作者： Mahoney, Michael W. Rao, Satish Wang, Di Zhang, Peng International Computer Science Institute Department of Statistics UC Berkeley Berkeley United States Department of Electrical Engineering and Computer Sciences UC Berkeley Berkeley United States Department of Computer Science Georgia Tech Atlanta United States

ISBN: (纸本)9783959770132

We study the problem of approximately solving positive linear programs (LPs). This class of LPs models a wide range of fundamental problems in combinatorial optimization and operations research, such as many resource allocation problems, solving non-negative linear systems, computing tomography, single/multi commodity flows on graphs, etc. For the special cases of pure packing or pure covering LPs, recent result by Allen-Zhu and Orecchia [2] gives Õ(1/ϵ3)-time parallel algorithm, which breaks the longstanding Õ(1/ϵ4) running time bound by the seminal work of Luby and Nisan [10]. We present new parallel algorithm with running time Õ(1/ϵ3) for the more general mixed packing and covering LPs, which improves upon the Õ(1/ϵ4)-time algorithm of Young [18, 19]. Our work leverages the ideas from both the optimization oriented approach [2, 17], as well as the more combinatorial approach with phases [18, 19]. In addition, our algorithm, when directly applied to pure packing or pure covering LPs, gives a improved running time of Õ(1/ϵ2).

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast Spectrum Analysis for an OFDR Using the FFT and SCZT Combination Approach

引用

IEEE PHOTONICS TECHNOLOGY LETTERS 2016年第6期28卷 657-660页

作者： Ma, Cheng Zhou, Qian Qin, Jie Xie, Weilin Dong, Yi Hu, Weisheng Shanghai Jiao Tong Univ State Key Lab Adv Opt Commun Syst & Networks Shanghai 200240 Peoples R China

Spectrum analysis is a significant process for many measurement applications which is usually implemented by fast Fourier transform (FFT). Nevertheless, FFT is not suitable to deal with big data because of extra burden of computation. Moreover, FFT fails to provide enough accuracy for signals with a very sparse and broadband spectral distribution. In this letter, we propose a combination approach called FFT-segmented chirp-Z transform that allows to analyze a long-time signal, while the data are received, achieving faster speed, better resolution with only small memory size which shows great potential in real-time performance. With the help of this approach, zoom bands are detected, and optimal parameters are established to guarantee peaks in a broadband spectrum can be found in short time with high precision. We implement this approach in a high spatial resolution optical frequency-domain reflectometry to realize high speed and high precision of components localization in optical fiber. The experimental result shows that 2-mm spatial resolution is achieved at a distance of 54 m and the processing time was less than 2 s for 10(7) data points.

关键词： Optical communication optical fiber measurements parallel algorithms signal processing spectral analysis

来源：评论

学校读者我要写书评

暂无评论

Real-time fair resource allocation in distributed software defined networks

arXiv

引用

arXiv 2017年

作者： Allybokus, Zaid Avrachenkov, Konstantin Leguay, Jérémie Maggi, Lorenzo Huawei Technologies Inria Sophia Antipolis

The performance of computer networks relies on how bandwidth is shared among different flows. Fair resource allocation is a challenging problem particularly when the flows evolve over time. To address this issue, bandwidth sharing techniques that quickly react to the traffic fluctuations are of interest, especially in large scale settings with hundreds of nodes and thousands of flows. In this context, we propose a distributed algorithm that tackles the fair resource allocation problem in a distributed SDN control architecture. Our algorithm continuously generates a sequence of resource allocation solutions converging to the fair allocation while always remaining feasible, a property that standard primal-dual decomposition methods often lack. Thanks to the distribution of all computer intensive operations, we demonstrate that we can handle large instances in real-time. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Replicated Computational Results (RCR) Report for "A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization"

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2016年第4期42卷 1–5页

作者： Meiser, Dominic Tech X Corp 5621 Arapahoe Ave Boulder CO 80303 USA

In this report, we replicate a subset of the performance results in the article "A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization."

关键词： Design algorithms Performance HSS matrices randomized sampling ULV factorization parallel algorithms distributed-memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：