Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all ...
详细信息
Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. Approach: We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649: 152-163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. Results: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time O(m(2) + n(3)), and in the edge-covering case it runs in time O(m(2)n);n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.
Several heuristics for bandwidth and profile reductions have been proposed since the 1960s. In systematic reviews, 133 heuristics applied to these problems have been found. The results of these heuristics have been an...
详细信息
ISBN:
(纸本)9783319623924;9783319623917
Several heuristics for bandwidth and profile reductions have been proposed since the 1960s. In systematic reviews, 133 heuristics applied to these problems have been found. The results of these heuristics have been analyzed so that, among them, 13 were selected in a manner that no simulation or comparison showed that these algorithms could be outperformed by any other algorithm in the publications analyzed, in terms of bandwidth or profile reductions and also considering the computational costs of the heuristics. Therefore, these 13 heuristics were selected as the most promising low-cost methods to solve these problems. Based on this experience, this article reports that in certain cases no heuristic for bandwidth or profile reduction can reduce the computational cost of the Jacobi-preconditioned Conjugate Gradient Method when using high-precision numerical computations.
For a graph G = (V, E), a set M subset of E is called a matching in G if no two edges in M share a common vertex. A matching M in G is called an induced matching in G if G[ M], the subgraph of G induced by M, is same ...
详细信息
ISBN:
(纸本)9783319530062;9783319530079
For a graph G = (V, E), a set M subset of E is called a matching in G if no two edges in M share a common vertex. A matching M in G is called an induced matching in G if G[ M], the subgraph of G induced by M, is same as G[ S], the subgraph of G induced by S = {v is an element of V | v is incident on an edge of M}. The Maximum Induced Matching problem is to find an induced matching of maximum cardinality. Given a graph G and a positive integer k, the Induced Matching Decision problem is to decide whether G has an induced matching of cardinality at least k. The Induced Matching Decision problem is NP-complete on bipartite graphs, but polynomial time solvable for convex bipartite graphs. In this paper, we show that the Induced Matching Decision problem is NP-complete for star-convex bipartite graphs and perfect elimination bipartite graphs. On the positive side, we propose polynomial time algorithms to solve the Maximum Induced Matching problem in circularconvex bipartite graphs and triad-convex bipartite graphs by making polynomial reductions from the Maximum Induced Matching problem in these graph classes to the Maximum Induced Matching problem in convex bipartite graphs.
The betweenness centrality measure has been widely adopted in various graph analytics applications, such as community detection and brain network analysis. Due to the high intensity of BC computation and rapid data gr...
详细信息
ISBN:
(纸本)9781538621295
The betweenness centrality measure has been widely adopted in various graph analytics applications, such as community detection and brain network analysis. Due to the high intensity of BC computation and rapid data growth, there have been a number of studies on parallel BC computation, either on CPUs or GPUs. However, there has not been a comprehensive comparative study on the BC algorithm on different processors. In this paper, we revisit shared-memory parallel BC computation on four kinds of processors, including multi-core CPUs, manycore GPUs, and two generations of Intel MIC processors. We find that, with suitable parallelization strategies and data-oriented optimizations, commodity multi-core CPUs are the fastest, followed by the second generation MIC. These two processors are faster than the state-of-the-art GPU implementations across all kinds of graphs. In comparison, the GPU outperforms the first generation MIC only on small-diameter graphs and is the slowest on the other kinds of graphs.
Suppose that we are given two vertex covers C-0 and C-t of a graph G, together with an integer threshold k >= max {vertical bar C-0 vertical bar, vertical bar C-t vertical bar}. Then, the VERTEX COVER RECONFIGURATI...
详细信息
Suppose that we are given two vertex covers C-0 and C-t of a graph G, together with an integer threshold k >= max {vertical bar C-0 vertical bar, vertical bar C-t vertical bar}. Then, the VERTEX COVER RECONFIGURATION problem is to determine whether there exists a sequence of vertex covers of G which transforms C-0 into Ct such that each vertex cover in the sequence is of cardinality at most k and is obtained from the previous one by either adding or deleting exactly one vertex. This problem is PSPACE-complete even for planar graphs. In this paper, we first give a linear-time algorithm to solve the problem for even-hole-free graphs, which include several well-known graphs, such as trees, interval graphs and chordal graphs. We then give an upper bound on k for which any pair of vertex covers in a graph G has a desired sequence. Our upper bound is best possible in some sense.
In this thesis we investigate maximum matching-width (MM-width) fur- ther. MM-width is a graph width parameter similar to treewidth, related to the number of maximum matchings made in an induced bipartite graph made f...
详细信息
In this thesis we investigate maximum matching-width (MM-width) fur- ther. MM-width is a graph width parameter similar to treewidth, related to the number of maximum matchings made in an induced bipartite graph made from partitions over the vertices of a graph. We improve the link between the value of maximum matching-width and the value of treewidth of a graph to MM(G) ≤ tw(G). We also give a bounded dynamic programming algo- rithm BMMDP to calculate the MM-width of a graph exactly. In addition to the exact algorithm we look into approximating the MM-width of graphs from above by using optimization algorithms based on local search and evolutionary algorithms. In the thesis we also make general observations about maximum matching- width, investigate the MM-width of standard graphs and come up with a set of safe kernelization rules to improve the performance of our algorithms. We also use the link between maximum matching-width and the other width parameters, the MM-widths of standard graphs and widths found during run time to add upper and lower bounds to the exact algorithm.
Suppose that each edge e of an undirected graph G is associated with three nonnegative integers , and , called the cost, vulnerability and capacity of e, respectively. Then, we consider the problem of finding paths in...
详细信息
Suppose that each edge e of an undirected graph G is associated with three nonnegative integers , and , called the cost, vulnerability and capacity of e, respectively. Then, we consider the problem of finding paths in G between two prescribed vertices with the minimum total cost;each edge e can be shared without any cost by at most paths, and can be shared by more than paths if we pay , but cannot be shared by more than paths even if we pay the cost for e. This problem generalizes the disjoint path problem, the minimum shared edges problem and the minimum edge cost flow problem for undirected graphs, and it is known to be NP-hard. In this paper, we study the problem from the viewpoint of specific graph classes, and give three results. We first show that the problem is NP-hard even for bipartite outerplanar graphs, 2-trees, graphs with pathwidth two, complete bipartite graphs, and complete graphs. We then give a pseudo-polynomial-time algorithm for bounded treewidth graphs. Finally, we give a fixed-parameter algorithm for chordal graphs when parameterized by the number of required paths.
Breadth-first search (BFS) is one of the most fundamental processing algorithm singraph theory. We previously presented a scalable BFS algorithm based on Beamer's direction optimizing algorithm forn on-uniform mem...
详细信息
ISBN:
(纸本)9781450343503
Breadth-first search (BFS) is one of the most fundamental processing algorithm singraph theory. We previously presented a scalable BFS algorithm based on Beamer's direction optimizing algorithm forn on-uniform memory access(NUMA)-based systems, in which the NUMA architecture was care-fully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory super computers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has a chieved performance rates of 219 billion edges per second on a Kronecker graph with 2(34) vertices and 2(38) edges on arack of an SGI UV 300 system with 1,152 threads. This result exceeds the fast estentry for a shared memory system on the current graph500 list presented in November 2015, which includes our previous implementation.
Application which need to process and manage large graph data sets have imposed significant challenges for data science community inrecent times. This talk discusses the key challenges which need to be handled when im...
详细信息
ISBN:
(纸本)9781450343503
Application which need to process and manage large graph data sets have imposed significant challenges for data science community inrecent times. This talk discusses the key challenges which need to be handled when implementing a next-generation graph processing and management platform. There are severalkey problems which needs to bead dressed in building such large graph processing system. First, optimized techniques needs to be followed for managing extremely large graph data. Second, new programming models and software tools need to be created for efficiently processing large graphs. This talk will discuss the approaches which need to be followed in addressing these two major issues and will highlight our vision in achieving the challenges of next-generation graph processing and management.
暂无评论