A new parallel algorithm for the max-flow problem on directed networks with single-source and single -sink is proposed. The algorithm is based on tree sub-networks and on efficient parallel algorithm to compute max-fl...
详细信息
A new parallel algorithm for the max-flow problem on directed networks with single-source and single -sink is proposed. The algorithm is based on tree sub-networks and on efficient parallel algorithm to compute max-flows on the tree sub-networks. The latter algorithm is proved to be work-optimal and time-optimal. The parallel implementation of the complete algorithm is more efficient than the best known parallel algorithm for the max-flow problem in terms of time-complexity and the sequential implementation of the algorithm achieves the best known sequential time-complexity, without using any complex data-structures or complex manipulations on the network. (C) 2022 Elsevier Inc. All rights reserved.
Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic versi...
详细信息
Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.
The article describes the method of construction of association rules retrieval algorithms out from function blocks having a unified interface and purely functional properties. The usage of function blocks to build as...
详细信息
ISBN:
(纸本)9783319209104;9783319209098
The article describes the method of construction of association rules retrieval algorithms out from function blocks having a unified interface and purely functional properties. The usage of function blocks to build association rules algorithms allows modifying the existing algorithms and building new algorithms with minimum effort. Besides, the function block properties allow to transform the algorithms into parallel form, thus improving their efficiency.
We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partitionconstrained determinantal point processes, as well as planar perfect matchings. For thes...
详细信息
ISBN:
(纸本)9781450395458
We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partitionconstrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation;Csanky proved it is in NC. However, parallel counting does not automatically translate to parallel sampling, as classic reductions between the two are inherently sequential. We show that a nearly quadratic parallel speedup over sequential sampling can be achieved for all the aforementioned distributions. If the distribution is supported on subsets of size k of a ground set, we show how to approximately produce a sample in (O) over tilde (k(1/2+c)) time with polynomially many processors for any constant c > 0. In the two special cases of symmetric determinantal point processes and planar perfect matchings, our bound improves to (O) over tilde(root k) and we show how to sample exactly in these cases. As our main technical contribution, we fully characterize the limits of batching for the steps of sampling-to-counting reductions. We observe that only O(1) steps can be batched together if we strive for exact sampling, even in the case of nonsymmetric determinantal point processes. However, we show that for approximate sampling, (O) over tilde (k(1/2-c)) steps can be batched together, for any entropically independent distribution, which includes all mentioned classes of determinantal point processes. Entropic independence and related notions have been the source of breakthroughs in Markov chain analysis in recent years, so we expect our framework to prove useful for distributions beyond those studied in this work.
Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link *** are routed along the shortest paths with sufficient resour...
详细信息
Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link *** are routed along the shortest paths with sufficient resources to facilitate efficient network utilization and minimize *** such dynamic networks,links frequently fail or get congested,making the recalculation of the shortest paths a computationally intensive *** routing protocols were proposed to overcome this problem by focusing on network utilization rather than ***,the design of fast shortest-path algorithms for data centers was largely neglected,though they are universal components of routing ***,parallelization techniques were mostly deployed for random network topologies,and not for regular topologies that are often found in data *** aim of this paper is to improve scalability and reduce the time required for the shortest-path calculation in data center networks by parallelization on general-purpose *** propose a novel algorithm that parallelizes edge relaxations as a faster and more scalable solution for popular data center topologies.
Integer linear programs (ILPs) and mixed integer programs (MIPs) often have multiple distinct optimal solutions, yet the widely used Gurobi optimization solver returns certain solutions at disproportionately high freq...
详细信息
Integer linear programs (ILPs) and mixed integer programs (MIPs) often have multiple distinct optimal solutions, yet the widely used Gurobi optimization solver returns certain solutions at disproportionately high frequencies. This behavior is disadvantageous, as, in fields such as biomedicine, the identification and analysis of distinct optima yields valuable domain-specific insights that inform future research directions. In the present work, we introduce MORSE (Multiple Optima via Random Sampling and careful choice of the parameter Epsilon), a randomized, parallelizable algorithm to efficiently generate multiple optima for ILPs. MORSE maps multiplicative perturbations to the coefficients in an instance's objective function, generating a modified instance that retains an optimum of the original problem. We formalize and prove the above claim in some practical conditions. Furthermore, we prove that for 0/1 selection problems, MORSE finds each distinct optimum with equal probability. We evaluate MORSE using two measures;the number of distinct optima found in r independent runs, and the diversity of the list (with repetitions) of solutions by average pairwise Hamming distance and Shannon entropy. Using these metrics, we provide empirical results demonstrating that MORSE outperforms the Gurobi method and unweighted variations of the MORSE method on a set of 20 Mixed Integer Programming Library (MIPLIB) instances and on a combinatorial optimization problem in cancer genomics.
Finding cohesive subgraphs is a crucial graph analysis kernelwidely used for social and biological networks (graphs). There exist various approaches for discovering insightful substructures in a network, such as findi...
详细信息
ISBN:
(纸本)9798400708435
Finding cohesive subgraphs is a crucial graph analysis kernelwidely used for social and biological networks (graphs). There exist various approaches for discovering insightful substructures in a network, such as finding cliques, community discovery, and truss decomposition. Finding cliques is a computationally intractable problem, making it difficult to identify cohesive subgraphs in large graphs. One possible solution is k-truss decomposition, which is a relaxed form of finding cliques that can be solved in polynomial time. Further, unlike global community detection-which focuses on breaking down the entire graph into disjoint communities-a local or goaloriented community search aims at finding the community of an entity of interest. In this work, we identify a k-truss-induced community discovery technique that can detect local communities in polynomial time. However, most previous studies have explored k-truss-induced local community formation in a serial setting, making them unsuitable for large graphs. In this paper, we design a parallel k-truss-induced local community construction method using multi-core parallelism. To the best of our knowledge, this is the first attempt to parallelize this algorithmic approach with extensive performance analysis. Our experiments demonstrate a significant performance improvement, with speedups from 19x to 55x for graphs with hundreds of millions to billions of edges, using NERSC Perlmutter compute nodes.
Long read technologies are continuing to evolve at a rapid pace, with the latest of the high fidelity technologies delivering reads over 10Kbp with high accuracy (99.9%). However, there also exist partially constructe...
详细信息
ISBN:
(纸本)9798350311990
Long read technologies are continuing to evolve at a rapid pace, with the latest of the high fidelity technologies delivering reads over 10Kbp with high accuracy (99.9%). However, there also exist partially constructed assemblies using short read data. Hybrid assembly workflows provide a way to combine the information in both these data sources and generate highly improved and near complete assemblies and genomic scaffolds. In this paper, we address the problem of mapping long reads to contigs (representing prior constructed partial assemblies). This is a many-to-many comparison application. However, brute force comparisons of all pairs is not practical. Therefore, in this paper, we present a parallel, alignment-free sketching-based algorithm that efficiently maps long reads to contigs. More specifically, our approach uses a minimizer-based Jaccard estimator (or JEM), a variant of the classical MinHashing technique, as its sketch. Experimental evaluation shows that our parallel algorithm is highly effective in producing a high quality mapping while improving significantly the time to solution compared to state-of-the-art mapping tools. For instance, for a large genome Betta splendens (approximate to 350Mbp genome) with 429K HiFi long reads and 98K contigs, our JEM approach produces a mapping with 99.31% precision and 96.18% recall, while yielding 7.13x speedup over a state-of-the-art mapper (Mashmap).
Radionavigation systems that operate in the low and medium frequency band require an accurate prediction of the electromagnetic ground wave propagation delay that is caused by the finite conductivity and permittivity ...
详细信息
ISBN:
(纸本)9798350301052
Radionavigation systems that operate in the low and medium frequency band require an accurate prediction of the electromagnetic ground wave propagation delay that is caused by the finite conductivity and permittivity of the earth's surface. To consider the complex pattern of the mixed land-sea propagation path in the application of the terrestrial MF Ranging-Mode (R-Mode) system, a parallel processing software framework for the calculation of the ground wave propagation delay and the attenuation was developed. It combines existing approaches for the calculation of electromagnetic ground-wave propagation with algorithms for the calculation of ground conductivity and permittivity following recommendation ITU-R P.527-6. Beside R-Mode, the framework enables the fast computation of propagation parameters for ground-waves in the high, medium and low frequency band on a 2D grid.
State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into e...
详细信息
ISBN:
(纸本)9781450395458
State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into evenly-sized chunks, these algorithms minimize the number of communication rounds required. Histogramming (computing positions of samples) guides sampling, enabling a decrease in the overall number of samples collected. We derive lower and upper bounds on the number of sampling/histogramming rounds required to compute a balanced partitioning. We improve on prior results to demonstrate that when using p processors, O(log* p) rounds with O(p/log* p) samples per round suffice. We match that with a lower bound that shows that any algorithm with O(p) samples per round requires at least Omega(log* p) rounds. Additionally, we prove the Omega(p log p) samples lower bound for one round, thus proving that existing one round algorithms: sample sort, AMS sort [2] and HSS [16] have optimal sample size complexity. To derive the lower bound, we propose a hard randomized input distribution and apply classical results from the distribution theory of runs.
暂无评论