Finite control set model predictive control (FCS-MPC) is a salient control method for power conversion systems that has recently enjoyed remarkable popularity. Several studies highlight the performance benefits that l...
详细信息
Finite control set model predictive control (FCS-MPC) is a salient control method for power conversion systems that has recently enjoyed remarkable popularity. Several studies highlight the performance benefits that long prediction horizons achieve in terms of closed-loop stability, harmonic distortions, and switching losses. However, the practical implementation is not straightforward due to its inherently high computational burden. To overcome this obstacle, the control problem can be formulated as an integer least-squares optimization problem, which is equivalent to the closest point search or closest vector problem in lattices. Different techniques have been proposed in the literature to solve it, with the sphere decoding algorithm (SDA) standing out as the most popular choice to address the long prediction horizon FCS-MPC. However, the state of the art in this field offers solutions beyond the conventional SDA that will be described in this article alongside future trends and challenges in the topic.
A data parallelization algorithm for the direct simulation Monte Carlo method for rarefied gas flows is considered. The scaling of performance of the main algorithm procedures are analyzed. Satisfactory performance sc...
详细信息
A data parallelization algorithm for the direct simulation Monte Carlo method for rarefied gas flows is considered. The scaling of performance of the main algorithm procedures are analyzed. Satisfactory performance scaling of the parallel particle indexing procedure is shown, and an algorithm for speeding up the operation of this procedure is proposed. Using examples of solving problems of free flow and flow around a cone for a 28-core node with shared memory, an acceptable speedup of the entire algorithm was obtained. The efficiency of the data parallelization algorithm and the computational domain decomposition algorithm for free flow is compared. Using the developed parallel code, a study of the supersonic rarefied flow around a cone is carried out.
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public...
详细信息
ISBN:
(纸本)9783030602451;9783030602444
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration. We combine previously reported algorithms with a proper caching strategy and develop hardware architecture. The main advantage of the proposed approach is eliminating the overhead of the data transfer between a CPU and a hardware accelerator. The authors believe that this is the first such architecture reported in the literature to date and predict to achieve up to 8 times higher throughput when compared to a multi-core high-performance CPU. Presented methods can be adapted for other sieving algorithms hard to implement in FPGAs due to the communication and memory bottleneck.
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and cos...
详细信息
We develop a nature-inspired generic programming language for parallel algorithms, one that works for all data structures and control structures. Any parallel algorithm satisfying intuitively-appealing postulates can ...
详细信息
Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic...
详细信息
ISBN:
(纸本)9781450391467
Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of k edge insertions or deletions, our batch-dynamic MSF algorithm runs in O(k log(6) n) expected amortized work and O(log(4) n) span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires Omega(n(1-o(1))) work per update or query on a graph with n vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow poly(n)-approximation. For average linkage, the bound weakens to Omega(n(1/2-o(1))) for incremental and decremental algorithms, and the bounds still hold when allowing n(o(1))-approximation.
Quasi-Monte Carlo methods have become the industry standard in computer graphics. For that purpose, efficient algorithms for low discrepancy sequences are discussed. In addition, numerical pitfalls encountered in prac...
详细信息
Quasi-Monte Carlo methods have become the industry standard in computer graphics. For that purpose, efficient algorithms for low discrepancy sequences are discussed. In addition, numerical pitfalls encountered in practice are revealed. We then take a look at massively parallel quasi-Monte Carlo integro-approximation for image synthesis by light transport simulation. Beyond superior uniformity, low discrepancy points may be optimized with respect to additional criteria, such as noise characteristics at low sampling rates or the quality of low-dimensional projections.
Many scientific and numeric computations rely on matrix-matrix multiplication as a fundamental component of their algorithms. It constitutes the building block in many matrix operations used in numeric solvers and gra...
详细信息
ISBN:
(纸本)9783031648809;9783031648816
Many scientific and numeric computations rely on matrix-matrix multiplication as a fundamental component of their algorithms. It constitutes the building block in many matrix operations used in numeric solvers and graph theory problems. Several algorithms have been proposed and implemented for matrix-matrix multiplication, especially, for distributed-memory systems, and these have been greatly studied. In particular, the Cannon's algorithm has been implemented for distributed-memory systems, mostly since the memory needs remain constant and are not influenced by the number of processors employed. The algorithm, however, involves block shifting of both matrices being multiplied. This paper presents a similar block-oriented parallel algorithm for matrix-matrix multiplication on a 2-dimensional processor grid, but with block shifting restricted to only one of the matrices. We refer to this as the Single Matrix Block Shift (SMBS) algorithm. The algorithm, we propose, is a variant of the Cannon's algorithm on distributed architectures and improves upon the performance complexity of the Cannon and SRUMMA algorithms. We present analytic as well as experimental comparative results of our algorithm with the standard Cannon's algorithm on 2-dimensional processor grids, showing over 4X performance improvement.
The problem of minimizing a submodular function (SFM) is a common generalization of several fundamental combinatorial optimization problems, including minimum s-t cuts in graphs and matroid intersection. It is well-kn...
详细信息
ISBN:
(纸本)9781665420556
The problem of minimizing a submodular function (SFM) is a common generalization of several fundamental combinatorial optimization problems, including minimum s-t cuts in graphs and matroid intersection. It is well-known that a submodular function can be minimized with only poly(N) function evaluation queries where N denotes the universe size. However, all known polynomial query algorithms for SFM are highly adaptive, requiring at least N rounds of adaptivity. A natural question is if SFM can be efficiently solved in a highly parallel manner, namely, with poly(N) queries using only poly-logarithmic rounds of adaptivity. An important step towards understanding the adaptivity needed to solve SFM efficiently was taken in the very recent work of Balkanski and Singer who showed that any SFM algorithm with poly(N) queries. This left open the possibility of efficient SFM algorithms with poly-logarithmic rounds of adaptivity. In this work, we strongly rule out this possibility by showing that any, possibly randomized, algorithm for submodular function minimization making poly(N) queries requires (Omega) over tilde (N-1/3) rounds of adaptivity. In fact, we show a polynomial lower bound on the number of rounds of adaptivity even for algorithms that make up to 2(N1-delta) queries, for any constant d > 0.
In a breakthrough result, Spielman and Teng (2004) developed a nearly-linear time solver for Laplacian linear equations, i.e. equations where the coefficient matrix is symmetric with non-negative diagonals and zero ro...
详细信息
ISBN:
(纸本)9798400704161
In a breakthrough result, Spielman and Teng (2004) developed a nearly-linear time solver for Laplacian linear equations, i.e. equations where the coefficient matrix is symmetric with non-negative diagonals and zero rowsums. Since the development of the Spielman-Teng solver, there has been substantial progress, simplifying and improving their result, but obtaining a fast practical, parallel Laplacian solver remains an open problem. We present a framework for obtaining extremely simple, parallel Laplacian linear equation solvers with nearly-linear work and sub-linear depth. Our framework allows us to parallelize any Laplacian solver based on repeated single-vertex approximate Gaussian elimination. We demonstrate this by parallelizing both the algorithm of Kyng and Sachdeva (2016) and the practical variant by Gao, Kyng, and Spielman (2023). Our framework is work-efficient in the sense of matching the sequential work of these algorithms. Our parallelization framework is very simple: We sample a subset of the current low-degree vertices (sparse columns), and in parallel we eliminate all vertices that are isolated in the resulting induced subgraph. This approach can be combined with any parallelizable approximate single-vertex elimination subroutine with sparse output. Given the simplicity of the approach, we believe that using it to parallelize the solver of Gao, Kyng, and Spielman (2023) is the most promising direction for obtaining practical parallel Laplacian solvers. If we additionally use a parallel spectral sparsification routine, our approach can be modified to work in polylogarithmic depth and nearly-linear work.
暂无评论