Direct multisearch (DMS) is a derivative-free optimization class of algorithms, suited for computing approximations to the complete Pareto front of a given multiobjective optimization problem. In DMS class, constraint...
详细信息
Direct multisearch (DMS) is a derivative-free optimization class of algorithms, suited for computing approximations to the complete Pareto front of a given multiobjective optimization problem. In DMS class, constraints are addressed with an extreme barrier approach, only evaluating feasible points. It has a well-supported convergence analysis and simple implementations present a good numerical performance, both in academic test sets and in real applications. Recently, this numerical performance was improved with the definition of a search step based on the minimization of quadratic polynomial models, corresponding to the algorithm BoostDMS. In this work, we propose and numerically evaluate strategies to improve the performance of BoostDMS, mainly through parallelization applied to the search and to the poll steps. The final parallelized version not only considerably decreases the computational time required for solving a multiobjective optimization problem, but also increases the quality of the computed approximation to the Pareto front. Extensive numerical results will be reported in an academic test set and in a chemical engineering application.
Based on two-grid discretizations, some local and parallel stabilized finite element methods are proposed and investigated for the Stokes problem in this paper. For the finite element discretization, the lowest equal-...
详细信息
Based on two-grid discretizations, some local and parallel stabilized finite element methods are proposed and investigated for the Stokes problem in this paper. For the finite element discretization, the lowest equal-order finite element pairs are chosen to circumvent the discrete inf-sup condition. In these algorithms, we derive the low-frequency components of the solution for the Stokes problem on a coarse grid and catch the high-frequency components on a fine grid using some local and parallel procedures. Optimal error bounds are demonstrated and some numerical experiments are carried out to support theoretical results.
The 3D surface reconstruction is critical for various applications, demanding efficient computational approaches. Traditional Radial Basis Functions (RBFs) methods are limited by increasing data points, leading to slo...
详细信息
ISBN:
(纸本)9798350363074;9798350363081
The 3D surface reconstruction is critical for various applications, demanding efficient computational approaches. Traditional Radial Basis Functions (RBFs) methods are limited by increasing data points, leading to slower execution times. Addressing this, our study introduces an experimental parallelization effort using Julia, as well-known for high-performance scientific computing. We developed an initial sequential RBF algorithm in Julia, then expanded it to a parallel model, exploiting Multi-Threading to enhance execution speed while maintaining accuracy. This initial exploration into Julia's parallel computing capabilities shows marked performance gains in 3D surface reconstruction, offering promising directions for future research. Our findings affirm Julia's potential in computationally intensive tasks, with test results confirming the expected time efficiency improvements.
For parallel-in-time simulation of large-scale power systems, this paper proposes a differential transformation based adaptive Parareal method for significantly improved convergence and time performance compared to a ...
详细信息
For parallel-in-time simulation of large-scale power systems, this paper proposes a differential transformation based adaptive Parareal method for significantly improved convergence and time performance compared to a traditional Parareal method, which iterates a sequential, numerical coarse solution over extended time steps to connect parallel fine solutions within respective time steps. The new method employs the differential transformation to derive a semi-analytical coarse solution of power system differential-algebraic equations, by which the order and time step, as well as the window length with a multi-window solution strategy, can adaptively vary with the response of the system. Thus, the new method can reduce divergences and also speed up the overall simulation. Extensive tests on the IEEE 39-bus system and the Polish 2383-bus system have verified the performance of the proposed method.
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and cos...
详细信息
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
In this paper, we present an efficient parallel derandomization method for randomized algorithms that rely on concentrations such as the Chernoff bound. This settles a classic problem in parallel derandomization, whic...
详细信息
ISBN:
(纸本)9798400703836
In this paper, we present an efficient parallel derandomization method for randomized algorithms that rely on concentrations such as the Chernoff bound. This settles a classic problem in parallel derandomization, which dates back to the 1980s. Concretely, consider the set balancing problem where m sets of size at most B are given in a ground set of size n, and we should partition the ground set into two parts such that each set is split evenly up to a small additive (discrepancy) bound. A random partition achieves a discrepancy of O (root s log m) in each set, by Chernoff bound. We give a deterministic parallel algorithm that matches this bound, using near-linear work (O) over tilde (m + n + Sigma(m)(i=1) vertical bar S-i vertical bar and polylogarithmic depth poly(log (mn)). The previous results were weaker in discrepancy and/or work bounds: Motwani, Naor, and Naor [FOCS'89] and Berger and Rompel [FOCS'89] achieve discrepancy BY center dot $ (p B log <) with work <(O)over tilde> (m + n + Sigma(m)(i=1) vertical bar S-i vertical bar)center dot m(Theta(1/epsilon)) and polylogarithmic depth;the discrepancy was optimized to O (root s log m) in later work, e.g. by Harris [Algorithmica'19], but the work bound remained prohibitively high at (O) over tilde (m(4)n(3)). Notice that these would require a large polynomial number of processors to even match the near-linear runtime of the sequential algorithm. Ghaffari, Grunau, and Rozhon [FOCS'23] achieve discrepancy s/poly(log(nm)) + O(root s log m) with near-linear work and polylogarithmic-depth. Notice that this discrepancy is nearly quadratically larger than the desired bound and barely sublinear with respect to the trivial bound of s. Our method is different from prior work. It can be viewed as a novel bootstrapping mechanism that uses crude partitioning algorithms as a subroutine and sharpens their discrepancy to the optimal bound. In particular, we solve the problem recursively, by using the crude partition in each iterat
The digital age came with an extraordinary ability to generate data across organizations, people, and devices, data that needs to be analyzed, processed and stored. A well-known technique for analyzing this kind of da...
详细信息
Finding connected components is a fundamental problem in graph and network analysis. It also serves as a subroutine in other graph problems. There are efficient sequential algorithms for finding connected components i...
详细信息
ISBN:
(纸本)9783031695827;9783031695834
Finding connected components is a fundamental problem in graph and network analysis. It also serves as a subroutine in other graph problems. There are efficient sequential algorithms for finding connected components in a graph. However, a sequential algorithm can take a long time for a large graph. parallel algorithms can significantly speed up computation using multiple processors. This paper presents a fast shared-memory parallel algorithm named ALZI (Afforest with LinkJump and Zero Implant) to find connected components in a graph. ALZI is an improvement of a recent state-of-the-art parallel algorithm called Afforest. We propose a few non-trivial optimizations that result in better performance in terms of runtime and scalability. We performed rigorous experimentation using a wide variety of real-world and artificial graphs to evaluate the performance of ALZI. The experimental results show that ALZI is 1.4-2.3 times faster than Afforest on these graphs and provides better scalability than Afforest. ALZI has the ability to work with very large graphs. On a Kronecker graph with 4.2 billion edges, ALZI can find the connected components in just 1.02 s using 128 processors.
QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. This broad spectrum of applications has drawn academic and commercial attention to...
详细信息
ISBN:
(纸本)9798400705977
QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. This broad spectrum of applications has drawn academic and commercial attention to developing many software libraries and domain-specific hardware solutions. In the Internet of Things (IoT) domain, multicore parallel Ultra-Low-Power (PULP) architectures are emerging as energy-efficient alternatives, outperforming conventional single-core devices by coupling parallel processing with near-threshold computing. To the best of the authors' knowledge, our study introduces the first parallelized and optimized implementation of three distinct QR decomposition methods (Givens rotations, Gram-Schmidt process, and Householder transformation) on GAP-9, a commercial embodiment of the PULP architecture. parallel execution on the 8-core cluster leads to a reduction in the total number of cycles by 241% for Givens rotations, 470% for Gram-Schmidt, and 567% for Householder, compared to the GAP9 1-core scenario. while each of them only consumes 0.013 mJ, 0.012 mJ, and 0.216 mJ, respectively. Compared to traditional single-core architectures based on ARM architectures, we achieve 8x, 24x, and 30x better performance and 36x, 35x, and 30x better energy efficiency, paving the way for broad adoption of complex linear algebra tasks in the IoT domain.
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency o...
详细信息
ISBN:
(纸本)9798400704161
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency on large-diameter graphs, which is a common challenge for many existing parallel graph processing systems due to the high overhead in synchronizing threads when traversing the graph in the breadth-first order. The core idea in PASGAL is a technique called vertical granularity control (VGC) to hide synchronization overhead by careful algorithm redesign and new data structures. We compare PASGAL with existing parallel implementations on several fundamental graph problems. PASGAL is always competitive on small-diameter graphs, and is significantly faster on large-diameter graphs.
暂无评论