The development of smart grid and the increasing scale of power system bring much pressure to the electromagnetic transient simulation of a power system. The graphic processing unit (GPU), which features the massive c...
详细信息
The development of smart grid and the increasing scale of power system bring much pressure to the electromagnetic transient simulation of a power system. The graphic processing unit (GPU), which features the massive concurrent threads and excellent floating point performance, brings a new chance to the area of power system simulation. This study introduces a parallel lower triangular and upper triangular decomposition algorithm and calculation strategy of electromagnetic transient simulation based on GPU. In this scheme, the GPU is mainly used to do the computationally intensive part of the simulation in parallel on its built-in multiple processing cores, and the CPU is assigned for updating history terms and flow control of the simulation. By comparing with the results simulating by the CPU-only implementations, the validity and efficiency of the proposed method are verified.
Abstract: The problem of numerical modeling of the process of initiating seismic activity on the shelf and its destructive effect on composite oil pipelines laid along the seabed is considered. To describe the dynamic...
详细信息
In this article authors present a new method to construct low-rank approximations of dense huge-size matrices. The method develops mosaic-skeleton method and belongs to kernel-independent methods. In distinction from ...
详细信息
In this article authors present a new method to construct low-rank approximations of dense huge-size matrices. The method develops mosaic-skeleton method and belongs to kernel-independent methods. In distinction from a mosaic-skeleton method, the new one utilizes the hierarchical structure of matrix not only to define matrix block structure but also to calculate factors of low-rank matrix representation. The new method was applied to numerical calculation of boundary integral equations that appear from 3D problem of scattering monochromatic electromagnetic wave by ideal-conducting bodies. The solution of model problem is presented as an example of method evaluation.
This thesis studies the properties of vector-based routing protocols whose underlying algebras are strictly increasing. Strict increasingness has previously been shown to be both a sufficient and a necessary condition...
详细信息
This thesis studies the properties of vector-based routing protocols whose underlying algebras are strictly increasing. Strict increasingness has previously been shown to be both a sufficient and a necessary condition for the convergence of path-vector protocols. One of the key contributions of this thesis is to link vector-based routing to a much larger family of asynchronous iterative algorithms. This unlocks a significant body of existing theory, and allows asynchronous protocols to be proved correct by purely synchronous reasoning. As well as applying it to routing protocols, this thesis advances the asynchronous theory in two ways. Firstly it shows that the existing conditions required for convergence may be relaxed. Secondly it proposes the first model for ``dynamic'' asynchronous processes in which both the problem being solved and the set of participants change over time. The thesis' attention then turns to models of routing problems, and presents a new algebraic structure that is simpler and more expressive than the state of the art. In particular this structure is capable of modelling routing problems that underlie both distance-vector and path-vector protocols. Consequently these two families of vector-based protocols may be unified for the first time. The new structure is also capable of modelling protocols that use path-dependent conditional policy. Next the work above is used to construct a model of an abstract vector-based protocol. This is then used in the first proof of correctness for strictly increasing distance-vector protocols and a new proof of correctness for strictly increasing path-vector protocols. The latter is an improvement over previous results as it i) proves that convergence is deterministic ii) does not assume reliable communication between nodes and iii) applies to path-vector protocols with path-dependent conditional policy. The long standing question of the worst-case rate of convergence for a strictly increasing path-vector proto
parallel algorithms on CPU and GPU are implemented for the Unified Gas-Kinetic Scheme and their performances are investigated and compared by a two dimensional channel flow case. The parallel CPU algorithm has a one d...
详细信息
GPU hardware and CUDA architecture provide a powerful platform to develop parallel algorithms. Implementation of heuristic and metaheuristic algorithms on GPUs are limited in literature. Nowadays developing parallel a...
详细信息
GPU hardware and CUDA architecture provide a powerful platform to develop parallel algorithms. Implementation of heuristic and metaheuristic algorithms on GPUs are limited in literature. Nowadays developing parallel algorithms on GPU becomes very important. In this paper, NP-Hard Quadratic Assignment Problem (QAP) that is one of the combinatorial optimization problems is discussed. parallel Multistart Simulated Annealing (PMSA) method is developed with CUDA architecture to solve QAP. An efficient method is developed by providing multistart technique and cooperation between threads. The cooperation is occurred with threads in both the same and different blocks. This paper focuses on both acceleration and quality of solutions. Computational experiments conducted on many Quadratic Assignment Problem Library (QAPLIB) instances. The experimental results show that PMSA runs up to 29x faster than a single-core CPU and acquires best known solution in a short time in many benchmark datasets. (C) 2018 Karabuk University. Publishing services by Elsevier B.V.
We introduce a method for "sparsifying" distributed algorithms and exhibit how it leads to improvements that go past known barriers in two algorithmic settings of large-scale graph processing: Massively Para...
详细信息
We introduce a method for "sparsifying" distributed algorithms and exhibit how it leads to improvements that go past known barriers in two algorithmic settings of large-scale graph processing: Massively parallel Computation (MPC), and Local Computation algorithms (LCA). MPC with Strongly Sublinear Memory: Recently, there has been growing interest in obtaining MPC algorithms that are faster than their classic O(log n)-round parallel (PRAM) counterparts for problems such as Maximal Independent Set (MIS), Maximal Matching, 2-Approximation of Minimum Vertex Cover, and (1+ϵ)-Approximation of Maximum Matching. Currently, all such MPC algorithms require memory of Ω(n) per machine: Czumaj et al. [STOC'18] were the first to handle Ω(n) memory, running in O((log log n)2) rounds, who improved on the n1+Ω(1)memory requirement of the O(1)-round algorithm of Lattanzi et al [SPAA'11]. We obtain Õ(√logΔ)-round MPC algorithms for all these four problems that work even when each machine has strongly sublinear memory, e.g., nαfor any constant α ∈ (0, 1). Here, Δ denotes the maximum degree. These are the first sublogarithmic-time MPC algorithms for (the general case of) these problems that break the linear memory barrier. LCAs with Query Complexity Below the Parnas-Ron Paradigm: Currently, the best known LCA for MIS has query complexity ΔO(logΔ)poly(log n), by Ghaffari [SODA'16], which improved over the ΔO(log2Δ)poly(log n) bound of Levi et al. [Algorithmica'17]. As pointed out by Rubinfeld, obtaining a query complexity of poly(Δlog n) remains a central open question. Ghaffari's bound almost reaches a ΔΩ(logΔ/log logΔ)barrier common to all known MIS LCAs, which simulate a distributed algorithm by learning the full local topology, à la Parnas-Ron [TCS'07]. There is a barrier because the distributed complexity of MIS has a lower bound of Ω(logΔ/loglogΔ), by results of Kuhn, et al. [JACM'16], which means this methodology cannot go below query complexity ΔΩ(logΔ/log logΔ). We break this ba
Clustering of uncertain objects in large uncertain databases and problem of mining uncertain data has been well studied. In this paper, clustering of uncertain objects with location uncertainty is studied. Moving obje...
详细信息
Clustering of uncertain objects in large uncertain databases and problem of mining uncertain data has been well studied. In this paper, clustering of uncertain objects with location uncertainty is studied. Moving objects, like mobile devices, report their locations periodically, thus their locations are uncertain and best described by a probability density function. The number of objects in a database can be large which makes the process of mining accurate data, a challenging and time consuming task. Authors will give an overview of existing clustering methods and present a new approach for data mining and parallel computing of clustering problems. All existing methods use pruning to avoid expected distance calculations. It is required to calculate the expected distance numerical integration, which is time-consuming. Therefore, a new method, called Segmentation of Data Set Area-parallel, is proposed. In this method, a data set area is divided into many small segments. Only clusters and objects in that segment are observed. The number of segments is calculated using the number and location of clusters. The use of segments gives the possibility of parallel computing, because segments are mutually independent. Thus, each segment can be computed on multiple cores.
While adaptive integration by region partitioning is generally effective in low dimensions, quasi-Monte Carlo methods can be used for integral approximations in moderate to high dimensions. Important application areas...
详细信息
While adaptive integration by region partitioning is generally effective in low dimensions, quasi-Monte Carlo methods can be used for integral approximations in moderate to high dimensions. Important application areas include high-energy physics, statistics, computational finance and stochastic geometry with applications in robotics, tessellations and imaging from medical data using tetrahedral meshes. Lattice rule integration is a class of quasi-Monte Carlo methods, implemented by an equal-weight cubature formula and suited for fairly smooth functions. Successful methods to construct these rules are the component-by-component (CBC) algorithm by Sloan and Restsov (2001) and the fast algorithm for CBC by Nuyens and Cools (2006). As the ability to invoke a large number of function evaluations is an important factor in high-dimensional integration, we investigate the acceleration of the CBC construction for large rank-1 lattice rules using the CUDA (cuFFT) Fast Fourier Transform procedure. A major part of this study is the development of high-performance lattice rule algorithms for approximating moderate- to high-dimensional integrals on GPUs. Lattice rules are incorporated with a periodizing transformation. We show that rank-1 lattice rules on GPUs (possibly with an integral transformation to alleviate the effects of boundary singularities) yield better accuracy and efficiency for various classes of integrals compared to classic Monte Carlo and adaptive methods. The computational power of GPU accelerators also leads to significant improvements in efficiency and accuracy for integration based on embedded (composite) lattices. These methods have been motivated as possible contributions to high-performance computing software such as the ParInt multivariate integration package developed at WMU. We further show an application in Bayesian analysis, leading to a class of problems where the integrand has a dominant peak in the integration domain. We demonstrate a black-box ap
In this paper, we consider networks of deterministic spiking neurons, firing synchronously at discrete times. We consider the problem of translating temporal information into spatial information in such networks, an i...
详细信息
暂无评论