检索结果-内蒙古大学图书馆

Reconstructing genome mixtures from partial adjacencies

BMC BIOINFORMATICS 2012年第19-sup期13卷 S9-S9页

作者： Mahmoody, Ahmad Kahn, Crystal L. Raphael, Benjamin J. Brown Univ Dept Comp Sci Providence RI 02912 USA

Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome;(ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k >= 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.

关键词： Reference Genome linear time algorithm Partial Genome Blue Edge Mixture Tree

来源：评论

学校读者我要写书评

暂无评论

Computing evolutionary distinctiveness indices in large scale analysis

引用

algorithmS FOR MOLECULAR BIOLOGY 2012年第1期7卷 6-6页

作者： Martyn, Iain Kuhn, Tyler S. Mooers, Arne O. Moulton, Vincent Spillner, Andreas Simon Fraser Univ IRMACS & BioSci Burnaby BC V5A 1S6 Canada Penn State Univ Dept Biol Mueller Lab 208 University Pk PA 16802 USA Univ E Anglia Sch Comp Sci Norwich NR4 7TJ Norfolk England Univ Greifswald Dept Math & Comp Sci D-17487 Greifswald Germany

We present optimal linear time algorithms for computing the Shapley values and 'heightened evolutionary distinctiveness' (HED) scores for the set of taxa in a phylogenetic tree. We demonstrate the efficiency of these new algorithms by applying them to a set of 10,000 reasonable 5139-species mammal trees. This is the first time these indices have been computed on such a large taxon and we contrast our finding with an ad-hoc index for mammals, fair proportion (FP), used by the Zoological Society of London's EDGE programme. Our empirical results follow expectations. In particular, the Shapley values are very strongly correlated with the FP scores, but provide a higher weight to the few monotremes that comprise the sister to all other mammals. We also find that the HED score, which measures a species' unique contribution to future subsets as function of the probability that close relatives will go extinct, is very sensitive to the estimated probabilities. When they are low, HED scores are less than FP scores, and approach the simple measure of a species' age. Deviations (like the Solendon genus of the West Indies) occur when sister species are both at high risk of extinction and their clade roots deep in the tree. Conversely, when endangered species have higher probabilities of being lost, HED scores can be greater than FP scores and species like the African elephant Loxondonta africana, the two solendons and the thumbless bat Furipterus horrens can move up the rankings. We suggest that conservation attention be applied to such species that carry genetic responsibility for imperiled close relatives. We also briefly discuss extensions of Shapley values and HED scores that are possible with the algorithms presented here.

关键词： Phylogenetic Diversity Fair Proportion linear time algorithm African Elephant Unrooted Tree

来源：评论

学校读者我要写书评

暂无评论

Dominating Induced Matchings for P₇-free Graphs in linear time

Dominating Induced Matchings for P7-free G...

引用

22nd International Symposium on algorithms and Computation (ISAAC)

作者： Brandstaedt, Andreas Mosca, Raffaele Univ Rostock Inst Informat D-18051 Rostock Germany Univ G D Annunzio Dipartimento Sci I-65121 Pescara Italy

ISBN: (纸本)9783642255908

Let G be a finite undirected graph with edge set E. An edge set E' subset of E is an induced matching in G if the pairwise distance of the edges of E' in G is at least two;E' is dominating in G if every edge e is an element of E\E' intersects some edge in E'. The Dominating Induced Matching Problem (DIM, for short) asks for the existence of an induced matching E' which is also dominating in G;this problem is also known as the Efficient Edge Domination Problem. The DIM problem is related to parallel resource allocation problems, encoding theory and network routing. It is NP-complete even for very restricted graph classes such as planar bipartite graphs with maximum degree three. However, its complexity was open for P-k-free graphs for any k >= 5;P-k denotes a chordless path with k vertices and k - 1 edges. We show in this paper that the weighted DIM problem is solvable in linear time for P-7-free graphs in a robust way.

关键词： dominating induced matching efficient edge domination P-7-free graphs linear time algorithm robust algorithm

来源：评论

学校读者我要写书评

暂无评论

Genomic distance under gene substitutions

引用

BMC BIOINFORMATICS 2011年第Sup9期12卷 1-9页

作者： Braga, Marilia D. V. Machado, Raphael Ribeiro, Leonardo C. Stoye, Jens Inst Nacl Metrol Qualidade & Tecnol BR-25250020 Duque De Caxias Brazil Univ Bielefeld Tech Fak AG Genominformat D-33594 Bielefeld Germany

Background: The distance between two genomes is often computed by comparing only the common markers between them. Some approaches are also able to deal with non-common markers, allowing the insertion or the deletion of such markers. In these models, a deletion and a subsequent insertion that occur at the same position of the genome count for two sorting steps. Results: Here we propose a new model that sorts non-common markers with substitutions, which are more powerful operations that comprehend insertions and deletions. A deletion and an insertion that occur at the same position of the genome can be modeled as a substitution, counting for a single sorting step. Conclusions: Comparing genomes with unequal content, but without duplicated markers, we give a linear time algorithm to compute the genomic distance considering substitutions and double-cut-and-join (DCJ) operations. This model provides a parsimonious genomic distance to handle genomes free of duplicated markers, that is in practice a lower bound to the real genomic distances. The method could also be used to refine orthology assignments, since in some cases a substitution could actually correspond to an unannotated orthology.

关键词： linear time algorithm Adjacency Graph Triangular Inequality Genomic Distance Unique Marker

来源：评论

学校读者我要写书评

暂无评论

Efficient scan mask techniques for connected components labeling algorithm

引用

EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING 2011年第1期2011卷 1-20页

作者： Sutheebanjard, Phaisarn Premchaiswadi, Wichian Siam Univ Grad Sch Informat Technol Bangkok 10160 Thailand

Block-based connected components labeling is by far the fastest algorithm to label the connected components in 2D binary images, especially when the image size is quite large. This algorithm produces a decision tree that contains 211 leaf nodes with 14 levels for the depth of a tree and an average depth of 1.5923. This article attempts to provide a faster method for connected components labeling. We propose two new scan masks for connected components labeling, namely, the pixel-based scan mask and the block-based scan mask. In the final stage, the block-based scan mask is transformed to a near-optimal decision tree. We conducted comparative experiments using different sources of images for examining the performance of the proposed method against the existing methods. We also performed an average tree depth analysis and tree balance analysis to consolidate the performance improvement over the existing methods. Most significantly, the proposed method produces a decision tree containing 86 leaf nodes with 12 levels for the depth of a tree and an average depth of 1.4593, resulting in faster execution time, especially when the foreground density is equal to or greater than the background density of the images.

关键词： connected components image processing labeling algorithm linear time algorithm pattern recognition

来源：评论

学校读者我要写书评

暂无评论

The minimum k-way cut of bounded size is fixed-parameter tractable

The minimum k-way cut of bounded size is fixed-parame...

引用

52nd Annual IEEE Symposium on Foundations of Computer Science (FOCS)

作者： Kawarabayashi, Ken-ichi Thorup, Mikkel Res Org Informat & Syst Natl Inst Informat Chiyoda Ku 2-1-2 Hitotsubashi Tokyo 1018430 Japan AT&T Labs Res Florham Pk NJ 07932 USA

ISBN: (纸本)9780769545714

We consider the minimum k-way cut problem for unweighted undirected graphs with a size bound s on the number of cut edges allowed. Thus we seek to remove as few edges as possible so as to split a graph into k components, or report that this requires cutting more than s edges. We show that this problem is fixed-parameter tractable (FPT) with the standard parameterization in terms of the solution sizes. More precisely, for s=O(1), we present a quadratic time algorithm. Moreover, we present a much easier linear time algorithm for planar graphs and bounded genus graphs. Our tractability result stands in contrast to known W[1] hardness of related problems. Without the size bound, Downey et al. [2003] proved that the minimum k-way cut problem is W[1] hard with parameter k, and this is even for simple unweighted graphs. Downey et al. asked about the status for planar graphs. We get linear time with fixed parameter k for simple planar graphs since the minimum k-way cut of a planar graph is of size at most 6k. More generally, we get FPT with parameter k for any graph class with bounded average degree. A simple reduction shows that vertex cuts are at least as hard as edge cuts, so the minimum k-way vertex cut is also W[1] hard with parameter k. Marx [2004] proved that finding a minimum k-way vertex cut of size s is also W[1] hard with parameters. Marx asked about the FPT status with edge cuts, which we prove tractable here. We are not aware of any other cut problem where the vertex version is W[1] hard but the edge version is FPT, e.g., Marx [2004] proved that the k terminal cut problem is FPT parameterized by the cut size, both for edge and vertex cuts.

关键词： $k$-way-cut Approximation algorithms Approximation methods FPT Helium Kernel Minimization Polynomials Terminology bounded genus graphs bounded genus graphs edge version fixed parameter tractability graph theory linear time algorithm minimum k-way cut plan

来源：评论

学校读者我要写书评

暂无评论

Efficient Edge Domination on Hole-Free Graphs in Polynomial time

Efficient Edge Domination on Hole-Free Graphs in Polynomial ...

引用

9th Latin American Symposium on Theoretical Informatics (LATIN 2010)

作者： Brandstaedt, Andreas Hundt, Christian Nevries, Ragnar Univ Rostock Inst Informat D-18051 Rostock Germany

ISBN: (纸本)9783642121999

This paper deals with the Efficient Edge Domination Problem (EED, for short), also known as Dominating Induced Matching Problem. For an undirected graph G = (V, E) FED asks for an induced matching M subset of E that simultaneously dominates all edges of G. Thus, the distance between edges of M is at least two and every edge in E is adjacent to an edge of M. EED is related to parallel resource allocation problems, encoding theory and network routing. The problem is NP-complete even for restricted classes like planar bipartite and bipartite graphs with maximum degree three. However, the complexity has been open for chordal bipartite graphs. This paper shows that EED can be solved in polynomial time on hole-free graphs. Moreover, it provides even linear time for chordal bipartite graphs. Finally, we strengthen the NP-completeness result to planar bipartite graphs of maximum degree three.

关键词： efficient edge domination dominating induced matching chordal bipartite graphs weakly chordal graphs hole-free graphs linear time algorithm polynomial time algorithm

来源：评论

学校读者我要写书评

暂无评论

linear-time protein 3-D structure searching with insertions and deletions

引用

algorithmS FOR MOLECULAR BIOLOGY 2010年第1期5卷 7-7页

作者： Shibuya, Tetsuo Jansson, Jesper Sadakane, Kunihiko Univ Tokyo Human Genome Ctr Inst Med Sci Minato Ku Tokyo 1088639 Japan Ochanomizu Univ Bunkyo Ku Tokyo 1128610 Japan Natl Inst Informat Chiyoda Ku Tokyo 1018430 Japan

Background: Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. Results: We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (i.e., insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels k is bounded by a constant. Our algorithm solves the above problem for a query of size m and a database of size N in average-case O(N) time, whereas the time complexity of the previously best algorithm was O(Nm(k+1)). Conclusions: Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.

关键词： Text Structure Chain Molecule linear time algorithm Textual String Preliminary Section

来源：评论

学校读者我要写书评

暂无评论

algorithms for Connected Component Labeling Based on Quadtrees

引用

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY 2009年第2期19卷 158-166页

作者： Aizawa, Kunio Tanaka, Shojiro Motomura, Koyo Kadowaki, Ryosuke Shimane Univ Dept Math & Comp Sci Interdisciplinary Fac Sci & Engn Matsue Shimane 6908502 Japan Shimane Univ Dept Math & Comp Sci Interdisciplinary Grad Sch Sci & Engn Matsue Shimane 6908502 Japan

An algorithm of linear time complexity is presented to label connected components of a binary image by a quadtree. For a given node, the search for all adjacent nodes is carried out in O(1) (i.e., constant time complexity for the worst case) using our formerly presented algorithm in (Aizawa et al., 3rd International Symposium on Communications, Control, and Signal Processing, 2008, 505-510), whereas it explores all possible adjacencies for each node in a usual way. Then during the process of tree formulation in the search, all equivalent relations of labels are stored as lists. time complexity of the algorithm is O(B+W) for the worst case and its auxiliary space is no more than O(B), where B and W correspond to the number of leaf nodes in a quadtree representing black and white quadrants, respectively. Empirical tests of the algorithm are employed in comparison with another linear time connected component labeling algorithm based on top-down quadtree traversal algorithm (Samet, IEEE Trans Pattern Anal Mach Intell PAMI-7 (1985), 94-98), as well as traditional row-by-row scanning algorithm using linear time Union-Find (Fiorio and Gustedt, Theor Comput Sci 154 (1996), 165-181). Our algorithm has shown the best performance in large images. (C) 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19, 158-166, 20091, Published online in Wiley InterScience (***). DOI 10.1002/ima.20179

关键词： adjacency search digital image dilated integer linear quadtree linear time algorithm Union-Find problem

来源：评论

学校读者我要写书评

暂无评论

algorithms for locating extremely conserved elements in multiple sequence alignments

引用

BMC BIOINFORMATICS 2009年第1期10卷 1-8页

作者： Tseng, Huei-Hun E. Tompa, Martin Univ Washington Dept Comp Sci & Engn Seattle WA 98195 USA Univ Washington Dept Genome Sci Seattle WA 98195 USA

Background: In 2004, Bejerano et al. announced the startling discovery of hundreds of "ultraconserved elements", long genomic sequences perfectly conserved across human, mouse, and rat. Their announcement stimulated a flurry of subsequent research. Results: We generalize the notion of ultraconserved element in a natural way from extraordinary human-rodent conservation to extraordinary conservation over an arbitrary set of species. We call these "Extremely Conserved Elements". There is a linear time algorithm to find all such Extremely Conserved Elements in any multiple sequence alignment, provided that the conservation is required to be across all the aligned species. For the general case of conservation across an arbitrary subset of the aligned species, we show that the question of whether there exists an Extremely Conserved Element is NP-complete. We illustrate the linear time algorithm by cataloguing all 177 Extremely Conserved Elements in the currently available 44-vertebrate whole-genome alignment, and point out some of the characteristics of these elements. Conclusions: The NP-completeness in the case of conservation across an arbitrary subset of the aligned species implies that it is unlikely an efficient algorithm exists for this general case. Despite this fact, for the interesting case of conservation across all or most of the aligned species, our algorithm is efficient enough to be practical. The 177 Extremely Conserved Elements that we catalog demonstrate many of the characteristics of the original ultraconserved elements of Bejerano et al.

关键词： Fugu linear time algorithm Alignment Column Alignment Block Extreme Conservation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：