In this article, a two grid partition of unity finite element method is proposed and investigated for electrically conducting incompressible fluid flows. This algorithm involves solving a much smaller nonlinear proble...
详细信息
In this article, a two grid partition of unity finite element method is proposed and investigated for electrically conducting incompressible fluid flows. This algorithm involves solving a much smaller nonlinear problem on a coarse grid, utilizing a partition of unity to decompose reasonably the residual problem into a series of independent subproblems on a fine grid, and carrying out a further coarse correction on the coarse grid. Rigorously theoretical analysis is presented and convergence results indicate that the method could reach the optimal convergence orders with proper configurations between the coarse mesh size H and the fine mesh size h. Finally, some numerical results are reported to verify our theoretical findings.
We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along t...
详细信息
We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dimensional setting, we establish almost sure weak convergence of the iterates and the asymptotic rate o(1/n) for the mean of the function values. We derive linear rates under strong convexity and error bound conditions. Our analysis is based on an abstract convergence principle for stochastic descent algorithms which allows to extend and simplify existing results.
Simulation of high-power microwave source devices generally uses parallel algorithms to speed up the operation. In recent years, with the upgrade of parallel technology, the parallel efficiency of the particle simulat...
详细信息
Simulation of high-power microwave source devices generally uses parallel algorithms to speed up the operation. In recent years, with the upgrade of parallel technology, the parallel efficiency of the particle simulation software has been further improved. Existing MPI-2 parallel technology of particle simulation software CHIPIC realizes the access to the local memory space of other processes through message passing. The new version of the MPI-3 standard provides the shared memory feature, which allows the data to be directly called by each process in the shared memory window, which reduces the information transmission. In this paper, based on the shared memory feature of MPI-3, the electromagnetic particle simulation parallel algorithm and dynamic load balancing algorithm are designed in the particle simulation software. The implementation of the two algorithms can improve the parallel efficiency from different aspects. The RKA and magnetic isolation oscillator high-power microwave devices are used as the test models. The test results show that the electromagnetic particle simulation parallel algorithm based on the shared memory feature of MPI-3 can improve the efficiency of the software by up to 44%. The efficiency of the dynamic load balancing algorithm based on MPI-3 can also be improved by up to 38%. (c) 2022 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://***/licenses/by/4.0/).
Contour trees are used for topological data analysis in scientific visualization. While originally computed with serial algorithms, recent work has introduced a vector-parallel algorithm. However, this algorithm is re...
详细信息
Contour trees are used for topological data analysis in scientific visualization. While originally computed with serial algorithms, recent work has introduced a vector-parallel algorithm. However, this algorithm is relatively slow for fully augmented contour trees which are needed for many practical data analysis tasks. We therefore introduce a representation called the hyperstructure that enables efficient searches through the contour tree and use it to construct a fully augmented contour tree in data parallel, with performance on average 6 times faster than the state-of-the-art parallel algorithm in the TTK topological toolkit.
A new parallel algorithm for the max-flow problem on directed networks with single-source and single -sink is proposed. The algorithm is based on tree sub-networks and on efficient parallel algorithm to compute max-fl...
详细信息
A new parallel algorithm for the max-flow problem on directed networks with single-source and single -sink is proposed. The algorithm is based on tree sub-networks and on efficient parallel algorithm to compute max-flows on the tree sub-networks. The latter algorithm is proved to be work-optimal and time-optimal. The parallel implementation of the complete algorithm is more efficient than the best known parallel algorithm for the max-flow problem in terms of time-complexity and the sequential implementation of the algorithm achieves the best known sequential time-complexity, without using any complex data-structures or complex manipulations on the network. (C) 2022 Elsevier Inc. All rights reserved.
Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic versi...
详细信息
Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.
With the increasing of the data size and the development of multi-core computers, asynchronous parallel stochastic optimization algorithms such as KroMagnon have gained significant attention. In this paper, we propose...
详细信息
With the increasing of the data size and the development of multi-core computers, asynchronous parallel stochastic optimization algorithms such as KroMagnon have gained significant attention. In this paper, we propose a new Sparse approximation and asynchronous parallel Stochastic Variance Reduced Gradient (SSVRG) method for sparse and high-dimensional machine learning problems. Unlike standard SVRG and its asynchronous parallel variant, KroMagnon, the snapshot point of SSVRG is set to the average of all the iterates in the previous epoch, which allows it to take much larger learning rates and also makes it more robust to the choice of learning rates. In particular, we use the sparse approximation of the popular SVRG estimator to perform completely sparse updates at all iterations. Therefore, SSVRG has a much lower per-iteration computational cost than its dense counterpart, SVRG++, and is very friendly to asynchronous parallel implementation. Moreover, we provide the convergence guarantees of SSVRG for both strongly convex and non-strongly convex problems, while existing asynchronous algorithms (e.g., KroMagnon and ASAGA) only have convergence guarantees for strongly convex problems. Finally, we extend SSVRG to non-smooth and asynchronous parallel settings. Numerical experimental results demonstrate that SSVRG converges significantly faster than the state-of-the-art asynchronous parallel methods, e.g., KroMagnon, and is usually more than three orders of magnitude faster than SVRG++.
This paper studies parallel algorithms for the longest increasing subsequence (LIS) problem. Let.. be the input size and k be the LIS length of the input. Sequentially, LIS is a simple problem that can be solved using...
详细信息
ISBN:
(纸本)9781450395458
This paper studies parallel algorithms for the longest increasing subsequence (LIS) problem. Let.. be the input size and k be the LIS length of the input. Sequentially, LIS is a simple problem that can be solved using dynamic programming (DP) in O(n log n) work. However, parallelizing LIS is a long-standing challenge. We are unaware of any parallel LIS algorithm that has optimal O(n log n) work and non-trivial parallelism (i.e., (O) over tilde (k) or o(n) span). This paper proposes a parallel LIS algorithm that costs O(n log k) work, (O) over tilde (k) span, and O(n) space, and is much simpler than the previous parallel LIS algorithms. We also generalize the algorithm to a weighted version of LIS, which maximizes the weighted sum for all objects in an increasing subsequence. To achieve a better work bound for the weighted LIS algorithm, we designed parallel algorithms for the van Emde Boas (vEB) tree, which has the same structure as the sequential vEB tree, and supports work-efficient parallel batch insertion, deletion, and range queries. We also implemented our parallel LIS algorithms. Our implementation is light-weighted, efficient, and scalable. On input size 10(9), our LIS algorithm outperforms a highly-optimized sequential algorithm (with O(n log k) cost) on inputs with k <= 3 x 10(5). Our algorithm is also much faster than the best existing parallel implementation by Shen et al. (2022) on all input instances.
Integer linear programs (ILPs) and mixed integer programs (MIPs) often have multiple distinct optimal solutions, yet the widely used Gurobi optimization solver returns certain solutions at disproportionately high freq...
详细信息
Integer linear programs (ILPs) and mixed integer programs (MIPs) often have multiple distinct optimal solutions, yet the widely used Gurobi optimization solver returns certain solutions at disproportionately high frequencies. This behavior is disadvantageous, as, in fields such as biomedicine, the identification and analysis of distinct optima yields valuable domain-specific insights that inform future research directions. In the present work, we introduce MORSE (Multiple Optima via Random Sampling and careful choice of the parameter Epsilon), a randomized, parallelizable algorithm to efficiently generate multiple optima for ILPs. MORSE maps multiplicative perturbations to the coefficients in an instance's objective function, generating a modified instance that retains an optimum of the original problem. We formalize and prove the above claim in some practical conditions. Furthermore, we prove that for 0/1 selection problems, MORSE finds each distinct optimum with equal probability. We evaluate MORSE using two measures;the number of distinct optima found in r independent runs, and the diversity of the list (with repetitions) of solutions by average pairwise Hamming distance and Shannon entropy. Using these metrics, we provide empirical results demonstrating that MORSE outperforms the Gurobi method and unweighted variations of the MORSE method on a set of 20 Mixed Integer Programming Library (MIPLIB) instances and on a combinatorial optimization problem in cancer genomics.
Recently, several medical applications have relied on hyperspectral imaging. This technology enables both automated diagnosis and surgeon guidance. The employed algorithms adopt machine and deep learning methods to cl...
详细信息
ISBN:
(纸本)9798350344196
Recently, several medical applications have relied on hyperspectral imaging. This technology enables both automated diagnosis and surgeon guidance. The employed algorithms adopt machine and deep learning methods to classify the images. In particular, Vision Transformers are a recent deep architecture that have been used to classify hyperspectral images of skin cancers achieving interesting results. However, deep architectures are computationally intensive and parallel architectures are mandatory to ensure a fast classification (depending on the application type even in real time). In this paper, we propose a parallel Vision Transformer architecture exploiting a low power GPU targeting the development of a portable diagnostic device. The classification time and power consumption of the low power board are compared with the performance of a desktop GPU. The results clearly highlight the suitability of the low power GPU to develop a portable diagnostic system based on hyperspectral imaging .
暂无评论