This paper presents a novel approach to speed up electromagnetic-transients (EMT) simulation, using graphics-processing-unit (GPU)-based computing. This paper extends earlier published works in the area, by exploiting...
详细信息
This paper presents a novel approach to speed up electromagnetic-transients (EMT) simulation, using graphics-processing-unit (GPU)-based computing. This paper extends earlier published works in the area, by exploiting additional parallelism inside EMT simulation. A 2D-parallel matrix-vector multiplication is used that is faster than previous 1D-methods. Also, this paper implements a GPU-specific sparsity technique to further speed up the simulations, as the available CPU-based sparsity techniques are not suitable for GPUs. In addition, as an extension to previous works, this paper demonstrates modelling a power-electronic subsystem. The efficacy of the approach is demonstrated using two different scalable test systems. A low granularity system, that is, one with a large cluster of buses connected to others with a few transmission lines is considered, as is also a high granularity where a small cluster of buses is connected to other clusters, thereby requiring more interconnecting transmission lines. Computation times for GPU-based computing are compared with the computation times for sequential implementations on the CPU. This paper shows two surprising differences of GPU simulation in comparison with CPU simulation. First, the inclusion of sparsity only makes minor reductions in the GPU-based simulation time. Second, excessive granularity, even though it appears to increase the number of parallel-computable subsystems, significantly slows down the GPU-based simulation.
The paper proposes and compares two parallel algorithms for GPU simulation of a mass-spring cloth model and image based collision detection and response approach. The algorithms are implemented using three different A...
详细信息
We present an efficient parallel algorithm for the following problem: Given an input collection D of n sequences of total length N, a length threshold f and a mismatch threshold κ, report all κ-mismatch maximal comm...
详细信息
Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an autom...
详细信息
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state-of-the-art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nod...
详细信息
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state-of-the-art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order O(1/root T) for general convex regularization functions, and the rate O(1/T) for strongly convex regularization functions, where T is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speed-up in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.
We address the issue of parallelizing constraint solvers based on local search methods for massively parallel architectures, involving several thousands of CPUs. We present a family of a constraint-based local search ...
详细信息
We study the problem of approximately solving positive linear programs (LPs). This class of LPs models a wide range of fundamental problems in combinatorial optimization and operations research, such as many resource ...
详细信息
ISBN:
(纸本)9783959770132
We study the problem of approximately solving positive linear programs (LPs). This class of LPs models a wide range of fundamental problems in combinatorial optimization and operations research, such as many resource allocation problems, solving non-negative linear systems, computing tomography, single/multi commodity flows on graphs, etc. For the special cases of pure packing or pure covering LPs, recent result by Allen-Zhu and Orecchia [2] gives Õ(1/ϵ3)-time parallel algorithm, which breaks the longstanding Õ(1/ϵ4) running time bound by the seminal work of Luby and Nisan [10]. We present new parallel algorithm with running time Õ(1/ϵ3) for the more general mixed packing and covering LPs, which improves upon the Õ(1/ϵ4)-time algorithm of Young [18, 19]. Our work leverages the ideas from both the optimization oriented approach [2, 17], as well as the more combinatorial approach with phases [18, 19]. In addition, our algorithm, when directly applied to pure packing or pure covering LPs, gives a improved running time of Õ(1/ϵ2).
Spectrum analysis is a significant process for many measurement applications which is usually implemented by fast Fourier transform (FFT). Nevertheless, FFT is not suitable to deal with big data because of extra burde...
详细信息
Spectrum analysis is a significant process for many measurement applications which is usually implemented by fast Fourier transform (FFT). Nevertheless, FFT is not suitable to deal with big data because of extra burden of computation. Moreover, FFT fails to provide enough accuracy for signals with a very sparse and broadband spectral distribution. In this letter, we propose a combination approach called FFT-segmented chirp-Z transform that allows to analyze a long-time signal, while the data are received, achieving faster speed, better resolution with only small memory size which shows great potential in real-time performance. With the help of this approach, zoom bands are detected, and optimal parameters are established to guarantee peaks in a broadband spectrum can be found in short time with high precision. We implement this approach in a high spatial resolution optical frequency-domain reflectometry to realize high speed and high precision of components localization in optical fiber. The experimental result shows that 2-mm spatial resolution is achieved at a distance of 54 m and the processing time was less than 2 s for 10(7) data points.
The performance of computer networks relies on how bandwidth is shared among different flows. Fair resource allocation is a challenging problem particularly when the flows evolve over time. To address this issue, band...
详细信息
In this report, we replicate a subset of the performance results in the article "A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization."
In this report, we replicate a subset of the performance results in the article "A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization."
暂无评论