Recent additions to the C++ standard and ongoing standardization efforts aim to add data-parallel types to the C++ standard library. This enables the use of vectorization techniques in existing C++ codes without havin...
详细信息
We present a randomized O(m log^2 n) work, O(polylog n) depth parallel algorithm for minimum cut. This algorithm matches the work bounds of a recent sequential algorithm by Gawrychowski, Mozes, and Weimann [ICALP'...
详细信息
We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the s...
详细信息
ISBN:
(纸本)9781479928941
We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in LASSO problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually all possibilities in between (e.g., gradient- or Newton-type methods) with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results show that the new method compares favorably to existing algorithms.
Triangle listing is an important topic in many practical applications. We have observed that this problem has not yet been studied systematically in the context of batch-dynamic graphs. In this paper, we aim to fill t...
详细信息
The systematic generation of prime numbers has been almost ignored since the 1990s, when most of the IT research resources related to prime numbers migrated to studies on the use of very large primes for cryptography,...
详细信息
The systematic generation of prime numbers has been almost ignored since the 1990s, when most of the IT research resources related to prime numbers migrated to studies on the use of very large primes for cryptography, and little effort was made to further the knowledge regarding techniques like sieving. At present, sieving techniques are mostly used for didactic purposes, and no real advances seem to be made in this domain. This systematic review analyzes the theoretical advances in sieving that have occurred up to the present. The research followed the PRISMA 2020 guidelines and was conducted using three established databases: Web of Science, IEEE Xplore and Scopus. Our methodical review aims to provide an extensive overview of the progress in prime sieving-unfortunately, no significant advancements in this field were identified in the last 20 years.
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show...
详细信息
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean (l(2)) and l(infinity) spaces.
The paper introduces a novel model of parallel metaheuristic optimization algorithms. The hierarchical graph model of a parallel optimization algorithm is proposed. It consists of the model for a parallel optimization...
详细信息
The paper introduces a novel model of parallel metaheuristic optimization algorithms. The hierarchical graph model of a parallel optimization algorithm is proposed. It consists of the model for a parallel optimization algorithm at the top level of the hierarchy and the model for a sequential optimization algorithm at the bottom level. The unified representation of a metaheuristic optimization algorithm, which allows representing a class of metaheuristic algorithms, is used. The extension of the proposed model to the parametric hierarchical model is proposed. Graph model transformations for a parallel algorithm analysis and synthesis are introduced. The representation of several metaheuristic algorithms with the proposed model is discussed. (C) 2019 The Authors. Published by Elsevier B.V.
Making full use of a sequential Delaunay-AFT mesher, a parallel method for the generation of large-scale tetrahedral meshes on distributed-memory machines is developed. To generate meshes with the required and the pre...
详细信息
Making full use of a sequential Delaunay-AFT mesher, a parallel method for the generation of large-scale tetrahedral meshes on distributed-memory machines is developed. To generate meshes with the required and the preserved properties, a Delaunay-AFT based domain decomposition (DD) technique is employed. Starting from the Delaunay triangulation (DT) covering the problem domain, this technique creates a layer of elements dividing the domain into several zones. The initially coarsely meshed domain is partitioned into DTs of subdomains which can be meshed in parallel. When the size of a subdomain is smaller than a user-specified threshold, it will be meshed with the standard Delaunay-AFT mesher. A two-level DD strategy is designed to improve the parallel efficiency of this algorithm. A dynamic load balancing scheme is also implemented using the Message Passing Interface (MPI). Out-of-core meshing is introduced to accommodate excessive large meshes that cannot be handled by the available memory of the computer (RAM). Numerical tests are performed for various complex geometries with thousands of surface patches. Ultra-large-scale meshes with more than ten billion tetrahedral elements have been created. Moreover, the meshes generated with different numbers of DD operations are nearly identical in quality: showing the consistency and the stability of the automatic decomposition algorithm. (C) 2019 Elsevier Ltd. All rights reserved.
The work of this paper is to solve the Black-Scholes equation under European options based on the time parallel algorithm combined with the kansa method. Firstly, the partial differential equation of the price of deri...
详细信息
In this work we formally derive and prove the correctness of the algorithms and data structures in a parallel, distributed-memory, generic finite element framework that supports h-adaptivity on computational domains r...
详细信息
In this work we formally derive and prove the correctness of the algorithms and data structures in a parallel, distributed-memory, generic finite element framework that supports h-adaptivity on computational domains represented as forest-of-trees. The framework is grounded on a rich representation of the adaptive mesh suitable for generic finite elements that is built on top of a low-level, light-weight forest-of-trees data structure handled by a specialized, highly parallel adaptive meshing engine, for which we have identified the requirements it must fulfill to be coupled into our framework. Atop this two-layered mesh representation, we build the rest of the data structures required for the numerical integration and assembly of the discrete system of linear equations. We consider algorithms that are suitable for both subassembled and fully assembled distributed data layouts of linear system matrices. The proposed framework has been implemented within the FEMPAR scientific software library, using p4est as a practical forest-of-octrees demonstrator. A strong scaling study of this implementation when applied to Poisson and Maxwell problems reveals remarkable scalability up to 32.2K CPU cores and 482.2M degrees of freedom. Besides, a comparative performance study of FEMPAR and the state-of-the-art deal. II finite element software shows at least comparative performance, and at most a factor of 2-3 improvement in the h-adaptive approximation of a Poisson problem with first- and second-order Lagrangian finite elements, respectively.
暂无评论