We present new numerical algorithms for solving the structural inverse gravimetry problem for the case of multiple surfaces. The inverse problem of finding the multiple surfaces that divide the constant density layers...
详细信息
We present new numerical algorithms for solving the structural inverse gravimetry problem for the case of multiple surfaces. The inverse problem of finding the multiple surfaces that divide the constant density layers is an ill-posed one described by a nonlinear integral equation of the first kind. To solve it, it is necessary to apply the regularization ideas. The new regularized variants of the gradient type methods with the weighting factors are constructed, namely, the steepest descent and conjugate gradient method. We suggest the empirical rule for choosing the regularization parameters. On the basis of the constructed methods, we elaborate the parallel algorithms and implement them in the multicore CPU using the OpenMP technology. A set of experiments with the disturbed data is performed to test the gradient algorithms and study performance of the developed code. For the test problems with quasi-real data, these new regularized algorithms increase the accuracy and speed up computation in comparison with the unregularized ones. By using the 8-core CPU, we achieve the speedup of 8 times.
We present a randomized parallel algorithm, in the Exclusive-Read Exclusive-Write (EREW) PRAM model, that computes a Maximal Independent Set (MIS) in O(log n) time and using O(m log(2) n) work, with high probability. ...
详细信息
ISBN:
(纸本)9781611976465
We present a randomized parallel algorithm, in the Exclusive-Read Exclusive-Write (EREW) PRAM model, that computes a Maximal Independent Set (MIS) in O(log n) time and using O(m log(2) n) work, with high probability. Thus, MIS is an element of RNC1. This time complexity is optimal and it improves on the celebrated O(log(2) n) time algorithms of Luby [STOC'85] and Alon, Babai, and Itai [JALG'86], which had remained the state of the art for the past 35 years.
We present an algorithm for the solution of a simultaneous space-time discretization of linear parabolic evolution equations with a symmetric differential operator in space. Building on earlier work, we recast this di...
详细信息
ISBN:
(数字)9783030759339
ISBN:
(纸本)9783030759339;9783030759322
We present an algorithm for the solution of a simultaneous space-time discretization of linear parabolic evolution equations with a symmetric differential operator in space. Building on earlier work, we recast this discretization into a Schur complement equation whose solution is a quasi-optimal approximation to the weak solution of the equation at hand. Choosing a tensor-product discretization, we arrive at a remarkably simple linear system. Using wavelets in time and standard finite elements in space, we solve the resulting system in linear complexity on a single processor, and in polylogarithmic complexity when parallelized in both space and time. We complement these theoretical findings with large-scale parallel computations showing the effectiveness of the method.
The Jaya algorithm is a recent heuristic approach for solving optimisation problems. It involves a random search for the global optimum, based on the generation of new individuals using both the best and the worst ind...
详细信息
The Jaya algorithm is a recent heuristic approach for solving optimisation problems. It involves a random search for the global optimum, based on the generation of new individuals using both the best and the worst individuals in the population, thus moving solutions towards the optimum while avoiding the worst current solution. In addition to its performance in terms of optimisation, a lack of control parameters is another significant advantage of this algorithm. However, the number of iterations needed to reach the optimal solution, or close to it, may be very high, and the computational cost can hamper compliance with time requirements. In this work, a chaotic two-dimensional (2D) map is used to accelerate convergence, and parallel algorithms are developed to alleviate the computational cost. Coarse- and fine-grained parallel algorithms are developed, the former based on multi-populations and the latter at the individual level, and in both cases these are accelerated by an improved (computational) use of the chaos map.
Many real-world problems such as internet routing are actually graph problems. To develop efficient solutions to such problems, more and more parallel graph algorithms are proposed. This paper discusses the mechanized...
详细信息
ISBN:
(数字)9783030888060
ISBN:
(纸本)9783030888060;9783030888053
Many real-world problems such as internet routing are actually graph problems. To develop efficient solutions to such problems, more and more parallel graph algorithms are proposed. This paper discusses the mechanized verification of a commonly used parallel graph algorithm, namely the Bellman-Ford algorithm, which provides an inherently parallel solution to the Single-Source Shortest Path problem. Concretely, we verify an unoptimized GPU version of the Bellman-Ford algorithm, using the VerCors verifier. The main challenge that we had to address was to find suitable global invariants of the graph-based properties for automated verification. This case study is the first deductive verification to prove functional correctness of the parallel Bellman-Ford algorithm. It provides the basis to verify other, optimized implementations of the algorithm. Moreover, it may also provide a good starting point to verify other parallel graph-based algorithms.
We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient alg...
详细信息
ISBN:
(纸本)9789082797060
We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient algorithm. We experimentally test the effectiveness and the efficiency of our algorithm using both real-world and synthetic data. We develop a shared-memory implementation of our algorithm using the multi-threaded API OpenMP, which attains significant speedup. We believe that our approach is a very competitive candidate for the solution of very large nonnegative tensor completion problems.
In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has (O) over tilde (nm + (n/d)(3)) work and (O) over tilde (d) depth for any depth par...
详细信息
ISBN:
(纸本)9781611976465
In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has (O) over tilde (nm + (n/d)(3)) work and (O) over tilde (d) depth for any depth parameter d is an element of [1;n]. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP'17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has (O) over tilde (nm+n(3)/d(2)) work and (O) over tilde (d) depth [Ullman and Yannakakis, SIAM J. Comput. '91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. By suitably adjusting the depth parameter d and applying known techniques, we obtain: (1) nearly work-efficient (O) over tilde (n(1/6))-depth parallel algorithms for the real-weighted single-source shortest paths problem and finding a bipartite perfect matching in a planar graph, (2) an (O) over tilde (n(9/8))-time sequential strongly polynomial algorithm for computing a minimum mean cycle or a minimum cost-to-time-ratio cycle of a planar graph, (3) a slightly faster algorithm for computing so-called external dense distance graphs of all pieces of a recursive decomposition of a planar graph. One notable ingredient of our parallel APSP algorithm is a simple deterministic (O) over tilde (nm)-work (O) over tilde (d)-depth procedure for computing (O) over tilde (n/d)-size hitting sets of shortest d-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called d-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an (O) over tilde (nm)-time deterministic
Genetic programming (GP) has been applied to image classification and achieved promising results. However, most GP-based image classification methods are only applied to small-scale image datasets because of the limit...
详细信息
ISBN:
(纸本)9781450392686
Genetic programming (GP) has been applied to image classification and achieved promising results. However, most GP-based image classification methods are only applied to small-scale image datasets because of the limits of high computation cost. Efficient acceleration technology is needed when extending GP-based image classification methods to large-scale datasets. Considering that fitness evaluation is the most time-consuming phase of the GP evolution process and is a highly parallelized process, this paper proposes a CPU multi-processing and GPU parallel approach to perform the process, and thus effectively accelerate GP for image classification. Through various experiments, the results show that the highly parallelized approach can significantly accelerate GP-based image classification without performance degradation. The training time of GP-based image classification method is reduced from several weeks to tens of hours, enabling it to be run on large-scale image datasets.
Stochastic Gradient Descent (SGD) is an essential element in Machine Learning (ML) algorithms. Asynchronous shared-memory parallel SGD (AsyncSGD), including synchronization-free algorithms, e.g. HOGWILD!, have receive...
详细信息
ISBN:
(纸本)9781665440660
Stochastic Gradient Descent (SGD) is an essential element in Machine Learning (ML) algorithms. Asynchronous shared-memory parallel SGD (AsyncSGD), including synchronization-free algorithms, e.g. HOGWILD!, have received interest in certain contexts, due to reduced overhead compared to synchronous parallelization. Despite that they induce staleness and inconsistency, they have shown speedup for problems satisfying smooth, strongly convex targets, and gradient sparsity. Recent works take important steps towards understanding the potential of parallel 50D for problems not conforming to these strong assumptions, in particular for deep learning (DL). There is however a gap in current literature in understanding when AsyncSGD algorithms are useful in practice, and in particular how mechanisms for synchronization and consistency play a role. We contribute with answering questions in this gap by studying a spectrum of parallel algorithmic implementations of AsyncSGD, aiming to understand how shared-data synchronization influences the convergence properties in fundamental DL applications. We focus on the impact of consistency-preserving non-blocking synchronization in SGD convergence, and in sensitivity to hyperparameter tuning. We propose Leashed-SGD, an extensible algorithmic framework of consistency-preserving implementations of AsyncSGD, employing lock-free synchronization, effectively balancing throughput and latency. Leashed-SGD features a natural contention-regulating mechanism, as well as dynamic memory management, allocating space only when needed. We argue analytically about the dynamics of the algorithms, memory consumption, the threads' progress over time, and the expected contention. We provide a comprehensive empirical evaluation, validating the analytical claims, benchmarking the proposed Leashed-SGD framework, and comparing to baselines for two prominent deep learning (DL) applications: multilayer perceptrons (MLP) and convolutional neural networks (CNN). We o
Power consumption is a significant challenge in the sustainability of computational science. The growing energy demands of increasingly complex simulations and algorithms lead to substantial resource use, which confli...
详细信息
Power consumption is a significant challenge in the sustainability of computational science. The growing energy demands of increasingly complex simulations and algorithms lead to substantial resource use, which conflicts with global sustainability goals. This paper investigates the energy efficiency of different parallel implementations of a Cellular Potts model, which models cellular behavior through Hamiltonian energy minimization techniques, leveraging modern GPU architectures. By evaluating alternative solvers, it demonstrates that specific methods can significantly enhance computational efficiency and reduce energy use compared to traditional approaches. The results confirm notable improvements in execution time and energy consumption. In particular, the experiments show a reduction in terms of power of up to 53%, providing a pathway towards more sustainable high-performance computing practices for complex biological simulations.
暂无评论