Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of...
详细信息
Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of non-overlapping rectangular areas with object level, called blocks. The IBR permits the execution of operations on image areas instead of image points and therefore leads to a substantial reduction of the required computational complexity. The experimental and the analytically derived results from parallel implementation in OpenMP, on a multicore computer, proved that a very good overall performance can be achieved. (C) 2019 Elsevier Inc. All rights reserved.
Existing parallel algorithms for wavelet tree construction have a work complexity of O(nlogσ). This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on p...
详细信息
Existing parallel algorithms for wavelet tree construction have a work complexity of O(nlogσ). This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on parallel integer sorting and has either O(nloglogn⌈logσ/lognloglogn⌉) work and polylogarithmic depth, or O(n⌈logσ/logn⌉) work and sub-linear depth. We also describe another algorithm that has O(n⌈logσ/logn⌉) work and O(σ+logn) depth. We then show how to use similar ideas to construct variants of wavelet trees (arbitrary-shaped binary trees and multiary trees) as well as wavelet matrices in parallel with lower work complexity than prior algorithms. Finally, we show that the rank and select structures on binary sequences and multiary sequences, which are stored on wavelet tree nodes, can be constructed in parallel with improved work bounds, matching those of the best existing sequential algorithms for constructing rank and select structures.
Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-paralleliz...
详细信息
Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-parallelize depth-first search (DFS). We observe that implementations of several parallel SCC detection algorithms show poor parallel performance on modern multicore platforms and large-scale networks. This paper introduces the Multistep method, a new approach that avoids work inefficiencies seen in prior SCC approaches. It does not rely on DFS, but instead uses a combination of breadth-first search (BFS) and a parallel graph coloring routine. We show that the Multistep method scales well on several real-world graphs, with performance fairly independent of topological properties such as the size of the largest SCC and the total number of SCCs. On a 16-core Intel Xeon platform, our algorithm achieves a 20X speedup over the serial approach on a 2 billion edge graph, fully decomposing it in under two seconds. For our collection of test networks, we observe that the Multistep method is 1.92X faster (mean speedup) than the state-of-the-art Hong et al. SCC method. In addition, we modify the Multistep method to find connected and weakly connected components, as well as introduce a novel algorithm for determining articulation vertices of biconnected components. These approaches all utilize the same underlying BFS and coloring routines.
An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate ne...
详细信息
An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate network perturbations having the same degree sequence of the original network. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks (e.g., peer-to-peer networks), studying various dynamic phenomena over a network (e.g., disease dynamics over a social contact network). The growth of real-world networks motivates the need to develop efficient parallel algorithms for performing a large sequence of edge switch operations. The dependencies among successive edge switch operations and the requirement of keeping the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors. In this paper, we present a distributed memory parallel algorithm for switching edges in massive networks (networks with billions of edges) and achieve a speedup factor of 85 with 1024 processors. One of the steps in our edge switch algorithm requires the computation of multinomial random variables in parallel. The paper presents the first non-trivial parallel algorithm for the problem. The algorithm achieves a speedup of 925 using 1024 processors.
In data envelopment analysis, methods for constructing sections of the frontier have been recently proposed to visualize the production possibility set. The aim of this paper is to develop, prove and test the methods ...
详细信息
In data envelopment analysis, methods for constructing sections of the frontier have been recently proposed to visualize the production possibility set. The aim of this paper is to develop, prove and test the methods for the visualization of production possibility sets using parallel computations. In this paper, a general scheme of the algorithms for constructing sections (visualization) of production possibility set is proposed. In fact, the algorithm breaks the original large-scale problems into parallel threads, working independently, then the piecewise solution is combined into a global solution. An algorithm for constructing a generalized production function is described in detail.
This paper proposed an event-triggered framework to solve network congestions caused by microgrids (MGs) in regional distributed networks. Two processes are included in this framework: congestion validation process an...
详细信息
This article gives a short overview of my dissertation, where new algorithms are given for two fundamental graph problems. We develop novel ways of using linear programming formulations, even exponential-sized ones, t...
详细信息
This article gives a short overview of my dissertation, where new algorithms are given for two fundamental graph problems. We develop novel ways of using linear programming formulations, even exponential-sized ones, to extract structure from problem instances and to guide algorithms in making progress. The first part of the dissertation addresses a benchmark problem in combinatorial optimization: the asymmetric traveling salesman problem (ATSP). It consists in finding the shortest tour that visits all vertices of a given edge-weighted directed graph. A.-approximation algorithm for ATSP is one that runs in polynomial time and always produces a tour at most. times longer than the shortest tour. Finding such an algorithm with constant rho had been a long-standing open problem. Here we give such an algorithm. The second part of the dissertation addresses the perfect matching problem. We have known since the 1980s that it has efficient parallel algorithms if the use of randomness is allowed. However, we do not know if randomness is necessary - that is, whether the matching problem is in the class NC. We show that it is in the class quasi-NC. That is, we give a deterministic parallel algorithm that runs in poly-logarithmic time on quasi-polynomially many processors.
In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naive parallelization approach where, following traditional sequential training that cons...
详细信息
In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naive parallelization approach where, following traditional sequential training that considers the supervised OPF, a priority queue is used to store the best samples at each learning iteration. The proposed approach replaces the priority queue with an array and a linear search aiming at using a parallel-friendly data structure. We show that this approach leads to less competition among threads, thus yielding a more temporal and spatial locality. Additionally, we show how the use of vectorization in distance calculations affects the overall speedup and also provide directions on the situations one can benefit from that. The experiments are carried out on five public datasets with a different number of samples and features on architectures with distinct levels of parallelism. On average, the proposed approach provides speedups of up to 11.8 x and 26 x in a 24-core Intel and 64-core AMD processors, respectively. (C) 2019 Elsevier B.V. All rights reserved.
A recent work shows how we can optimize a tree based mode of operation for a hash function where the sizes of input message blocks and digest are the same, subject to the constraint that the involved tree structure ha...
详细信息
A recent work shows how we can optimize a tree based mode of operation for a hash function where the sizes of input message blocks and digest are the same, subject to the constraint that the involved tree structure has all its leaves at the same depth. In this work, we show that we can further optimize the running time of such a mode by using a tree having leaves at all its levels. We make the assumption that the input message block has a size a multiple of that of the digest and denote by d the ratio block size over digest size. The running time is evaluated in terms of number of operations performed by the hash function, i.e. the number of calls to its underlying function. It turns out that a digest can be computed in inverted right perpendicular log(d+1)(l/2)inverted left perpendicular +2 evaluations of the underlying function using inverted right perpendicular l/2 inverted left perpendicular processors, where / is the number of blocks of the message. Other results of interest are discussed, such as the optimization of the parallel running time for a tree of restricted height. (C) 2019 Elsevier Inc. All rights reserved.
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and cos...
详细信息
ISBN:
(纸本)9781479942763
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method GEM). algorithms are designed for architectures with shared memory (openMP), distributed-memory (MPI) and for their combination (MPI+openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
暂无评论