In this paper, we present an algorithm using the GPGPU machine to compute the interval solutions of isolated real zeros of multivariate polynomial functions in given ranges. To overcome the state space explosion in th...
详细信息
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past de...
详细信息
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights into how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.
Watershed Transform is a widely used image segmentation technique that is known to be very data intensive and time consuming. The M-border Kernel Algorithm computes watersheds in the framework of Edge-Weighted Graphs ...
详细信息
Watershed Transform is a widely used image segmentation technique that is known to be very data intensive and time consuming. The M-border Kernel Algorithm computes watersheds in the framework of Edge-Weighted Graphs and allows to preserve the topology of the initial map. parallelization represents an effective solution to accelerate it. However, this task remains challenging due to the nature of this technique. In this paper, we address this problem. We start by analyzing the data dependency issues that this algorithm raises when dealing with parallel execution. With respect to that, we propose a parallelization strategy that opts for vertex scanning instead of edges scanning of the graph while preserving the thinning paradigm on which the M-border Kernel Algorithm is based. We show that this strategy overcomes the problem of the simultaneous lowering of two adjacent M-border edges that may occur when edge scan is used. The implementation of the proposed algorithm on a shared memory multicore architecture proves its effectiveness in terms of speedup. In fact, the experimental results show that a speedup factor of 5.55 is achieved using eight processors for 2048x2048 images over the performance of the sequential algorithm using a single processor on the same architecture. Furthermore, the gain in terms of execution time and thus speedup is guaranteed whatever is the size of images on which the algorithm is applied. In fact, a speedup factor of 5.55 is obtained for 2048x2048 images, 5.11 for 1024x1024 images and 4.45 for 512x512 images using eight cores.
An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate ne...
详细信息
An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate network perturbations having the same degree sequence of the original network. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks (e.g., peer-to-peer networks), studying various dynamic phenomena over a network (e.g., disease dynamics over a social contact network). The growth of real-world networks motivates the need to develop efficient parallel algorithms for performing a large sequence of edge switch operations. The dependencies among successive edge switch operations and the requirement of keeping the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors. In this paper, we present a distributed memory parallel algorithm for switching edges in massive networks (networks with billions of edges) and achieve a speedup factor of 85 with 1024 processors. One of the steps in our edge switch algorithm requires the computation of multinomial random variables in parallel. The paper presents the first non-trivial parallel algorithm for the problem. The algorithm achieves a speedup of 925 using 1024 processors.
Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of...
详细信息
Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of non-overlapping rectangular areas with object level, called blocks. The IBR permits the execution of operations on image areas instead of image points and therefore leads to a substantial reduction of the required computational complexity. The experimental and the analytically derived results from parallel implementation in OpenMP, on a multicore computer, proved that a very good overall performance can be achieved. (C) 2019 Elsevier Inc. All rights reserved.
Existing parallel algorithms for wavelet tree construction have a work complexity of O(nlogσ). This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on p...
详细信息
Existing parallel algorithms for wavelet tree construction have a work complexity of O(nlogσ). This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on parallel integer sorting and has either O(nloglogn⌈logσ/lognloglogn⌉) work and polylogarithmic depth, or O(n⌈logσ/logn⌉) work and sub-linear depth. We also describe another algorithm that has O(n⌈logσ/logn⌉) work and O(σ+logn) depth. We then show how to use similar ideas to construct variants of wavelet trees (arbitrary-shaped binary trees and multiary trees) as well as wavelet matrices in parallel with lower work complexity than prior algorithms. Finally, we show that the rank and select structures on binary sequences and multiary sequences, which are stored on wavelet tree nodes, can be constructed in parallel with improved work bounds, matching those of the best existing sequential algorithms for constructing rank and select structures.
Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-paralleliz...
详细信息
Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-parallelize depth-first search (DFS). We observe that implementations of several parallel SCC detection algorithms show poor parallel performance on modern multicore platforms and large-scale networks. This paper introduces the Multistep method, a new approach that avoids work inefficiencies seen in prior SCC approaches. It does not rely on DFS, but instead uses a combination of breadth-first search (BFS) and a parallel graph coloring routine. We show that the Multistep method scales well on several real-world graphs, with performance fairly independent of topological properties such as the size of the largest SCC and the total number of SCCs. On a 16-core Intel Xeon platform, our algorithm achieves a 20X speedup over the serial approach on a 2 billion edge graph, fully decomposing it in under two seconds. For our collection of test networks, we observe that the Multistep method is 1.92X faster (mean speedup) than the state-of-the-art Hong et al. SCC method. In addition, we modify the Multistep method to find connected and weakly connected components, as well as introduce a novel algorithm for determining articulation vertices of biconnected components. These approaches all utilize the same underlying BFS and coloring routines.
In data envelopment analysis, methods for constructing sections of the frontier have been recently proposed to visualize the production possibility set. The aim of this paper is to develop, prove and test the methods ...
详细信息
In data envelopment analysis, methods for constructing sections of the frontier have been recently proposed to visualize the production possibility set. The aim of this paper is to develop, prove and test the methods for the visualization of production possibility sets using parallel computations. In this paper, a general scheme of the algorithms for constructing sections (visualization) of production possibility set is proposed. In fact, the algorithm breaks the original large-scale problems into parallel threads, working independently, then the piecewise solution is combined into a global solution. An algorithm for constructing a generalized production function is described in detail.
This paper proposed an event-triggered framework to solve network congestions caused by microgrids (MGs) in regional distributed networks. Two processes are included in this framework: congestion validation process an...
详细信息
In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naive parallelization approach where, following traditional sequential training that cons...
详细信息
In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naive parallelization approach where, following traditional sequential training that considers the supervised OPF, a priority queue is used to store the best samples at each learning iteration. The proposed approach replaces the priority queue with an array and a linear search aiming at using a parallel-friendly data structure. We show that this approach leads to less competition among threads, thus yielding a more temporal and spatial locality. Additionally, we show how the use of vectorization in distance calculations affects the overall speedup and also provide directions on the situations one can benefit from that. The experiments are carried out on five public datasets with a different number of samples and features on architectures with distinct levels of parallelism. On average, the proposed approach provides speedups of up to 11.8 x and 26 x in a 24-core Intel and 64-core AMD processors, respectively. (C) 2019 Elsevier B.V. All rights reserved.
暂无评论