In this paper, we study the problem of sorting unichromosomal linear genomes by prefix double-cut-and-joins (or DCJs) in both the signed and the unsigned settings. Prefix DCJs cut the leftmost segment of a genome and ...
详细信息
ISBN:
(纸本)9783031206429;9783031206436
In this paper, we study the problem of sorting unichromosomal linear genomes by prefix double-cut-and-joins (or DCJs) in both the signed and the unsigned settings. Prefix DCJs cut the leftmost segment of a genome and any other segment, and recombine the severed endpoints in one of two possible ways: one of these options corresponds to a prefix reversal, which reverses the order of elements between the two cuts (as well as their signs in the signed case). Depending on whether we consider both options or reversals only, our main results are: (1) new structural lower bounds based on the breakpoint graph for sorting by unsigned prefix reversals, unsigned prefix DCJs, or signed prefix DCJs;(2) a polynomial-time algorithm for sorting by signed prefix DCJs, thus answering an open question in [7];(3) a 3/2-approximation for sorting by unsigned prefix DCJs, which is, to the best of our knowledge, the first sorting by prefix rearrangements problem that admits an approximation ratio strictly smaller than 2 (with the obvious exception of the polynomial-time solvable problems);and finally, (4) an FPT algorithm for sorting by unsigned prefix DCJs parameterised by the number of breakpoints in the genome.
Observations show that some HPC applications periodically alternate between (i) operations (computations, local data accesses) executed on the compute nodes, and (ii) I/O transfers of data and this behavior can be pre...
详细信息
Observations show that some HPC applications periodically alternate between (i) operations (computations, local data accesses) executed on the compute nodes, and (ii) I/O transfers of data and this behavior can be predicted before-hand. While the compute nodes are allocated separately to each application, the storage is shared, and thus, I/O access can be a bottleneck leading to contention. To tackle this issue, we design new static I/O scheduling algorithms that prescribe when each application can access the storage. To design a static algorithm, we emphasize on the periodic behavior of most applications. Scheduling the I/O volume of the different applications is repeated over time. This is critical since often the number of application runs is very high. In the following article, we develop a formal background for I/O scheduling. First, we define a model, bi-colored chain scheduling, and then, we go through related results existing in the literature and explore the complexity of this problem variants. Finally, to match the HPC context, we perform experiments based on use cases matching highly parallel applications or distributed learning framework
algorithmics is a crucial step in programming learning processes. It helps learners to propose a better writing and understanding of solutions without worrying about programming language syntactic complexity. Once alg...
详细信息
ISBN:
(纸本)9783030499327;9783030499310
algorithmics is a crucial step in programming learning processes. It helps learners to propose a better writing and understanding of solutions without worrying about programming language syntactic complexity. Once algorithm is clearly established, the next step is to translate it into a given programming language. When writing programs, syntax errors are common mainly due to the multiplicity of programming languages and strict syntactic rules. We propose in this work a web and mobile interface to automatically generate, from algorithms based on Pseudocode ontology, the corresponding codes in several programming languages such as C, Pascal and Python.
In this book, Dan Gusfield examines combinatorial algorithms to construct genealogical and exact phylogenetic networks, particularly ancestral recombination graphs (ARGs). The algorithms produce networks (or informati...
详细信息
ISBN:
(纸本)0262027526;9780262027526
In this book, Dan Gusfield examines combinatorial algorithms to construct genealogical and exact phylogenetic networks, particularly ancestral recombination graphs (ARGs). The algorithms produce networks (or information about networks) that serve as hypotheses about the true genealogical history of observed biological sequences and can be applied to practical biological problems. Phylogenetic trees have been the traditional means to represent evolutionary history, but there is a growing realization that networks rather than trees are often needed, most notably for recent human history. This has led to the development of ARGs in population genetics and, more broadly, to phylogenetic networks. ReCombinatorics offers an in-depth, rigorous examination of current research on the combinatorial, graph-theoretic structure of ARGs and explicit phylogenetic networks, and algorithms to reconstruct or deduce information about those networks. ReCombinatorics, a groundbreaking contribution to the emerging field of phylogenetic networks, connects and unifies topics in population genetics and phylogenetics that have traditionally been discussed separately and considered to be unrelated. It covers the necessary combinatorial and algorithmic background material; the various biological phenomena; the mathematical, population genetic, and phylogenetic models that capture the essential elements of these phenomena; the combinatorial and algorithmic problems that derive from these models; the theoretical results that have been obtained; related software that has been developed; and some empirical testing of the software on simulated and real biological data.
Component-trees are classical tree structures for grey-level image modelling. Component-graphs are defined as a generalization of component-trees to images taking their values in any (totally or partially) ordered set...
详细信息
Component-trees are classical tree structures for grey-level image modelling. Component-graphs are defined as a generalization of component-trees to images taking their values in any (totally or partially) ordered sets. Similar to component-trees, component-graphs are a lossless image model;then, they can allow for the development of various image processing approaches. However, component-graphs are not trees, but directed acyclic graphs. This makes their construction non-trivial, leading to nonlinear time cost and resulting in nonlinear space data structures. In this theoretical article, we discuss the notion(s) of component-graph, and we propose a strategy for their efficient building and representation, which are necessary conditions for further involving them in image processing approaches.
Analysing next generation sequencing data often involves the use of a sequence aligner to map the sequenced reads against a reference. The output of this process is the basis of many downstream analyses and its qualit...
详细信息
ISBN:
(纸本)9783319787220;9783319787237
Analysing next generation sequencing data often involves the use of a sequence aligner to map the sequenced reads against a reference. The output of this process is the basis of many downstream analyses and its quality thus critical. Many different alignment tools exist, each with a multitude of options, creating a vast amount of possibilities to align sequences. Choosing the correct aligner and options for a specific dataset is complex, and yet it can have a major impact on the quality of the data analysis. We propose a new approach in which we combine the output of multiple sequence aligners to create an improved sequence alignment files. Our novel approach can be used to either increase the sensitivity or the specificity of the alignment process. The software is freely available for non-commercial usage at http://***/.
The age reduction of acquaintance of children with programming is a worldwide trend. During long-term experiments, a freely distributed multiplatform educational and gaming system PictoMir was developed in the SRISA R...
详细信息
ISBN:
(纸本)9781450363969
The age reduction of acquaintance of children with programming is a worldwide trend. During long-term experiments, a freely distributed multiplatform educational and gaming system PictoMir was developed in the SRISA RAS, allowing preschoolers of the age 6+ to master a basic set of programming concepts: program, subroutine, repeater, feedback, command orders and command-questions, branching, repeaters, counters. In the 2016-2017 academic year, 902 children in 15 municipal kindergartens of Surgut successfully passed the annual cycle of "Algorithmic for Preschoolers", creating on the tablets programs for managing virtual robots and real robots-toys.
Component-graphs are defined as the generalization of component-trees to images taking their values in partially ordered sets. Similarly to component-trees, component-graphs are a lossless image model, and can allow f...
详细信息
ISBN:
(纸本)9783319572406;9783319572390
Component-graphs are defined as the generalization of component-trees to images taking their values in partially ordered sets. Similarly to component-trees, component-graphs are a lossless image model, and can allow for the development of various image processing approaches (antiextensive filtering, segmentation by node selection). However, component-graphs are not trees, but directed acyclic graphs. This induces a structural complexity associated to a higher combinatorial cost. These properties make the handling of component-graphs a non-trivial task. We propose a preliminary discussion on a new way of building and manipulating component-graphs, with the purpose of reaching reasonable space and time costs. Tackling these complexity issues is indeed required for actually involving component-graphs in efficient image processing approaches.
We start by studying the class of $k$- degenerate graphs which are often used to model sparse real-world graphs. We focus on enumeration questions for these graphs. That is, we try and provide algorithms which must ou...
详细信息
We start by studying the class of $k$- degenerate graphs which are often used to model sparse real-world graphs. We focus on enumeration questions for these graphs. That is, we try and provide algorithms which must output, without duplication, all the occurrences of some input subgraph. We investigate the questions of finding all cycles of some given size and all maximal cliques in the graph. Our two contributions are a worst-case output size optimal algorithm for fixed-size cycle enumeration and an output sensitive algorithm for maximal clique enumeration for this restricted class of graphs. In a second part we consider graphs in a distributed manner. We investigate questions related to finding matchings of the network, when no assumption is made on the initial state of the system. These algorithms are often referred to as self- stabilizing. Our two main contributions are an algorithm returning an approximation of the maximum matching and a new algorithm for maximal matching when communication simulates message passing. Finally, we introduce and investigate some special families of polytopes, namely primitive zonotopes, which can be described as the Minkowski sum of short primitive vectors. We highlight connections with the largest possible diameter of the convex hull of a set of points in dimension d whose coordinates are integers between 0 and k. Our main contributions are new lower bounds for this diameter question as well as descriptions of small instances of these objects.
Next generation sequencing produces an ever increasing amount of data, requiring increasingly fast computing infrastructures to keep up. We present GNATY, a collection of tools for NGS data analysis, aimed at optimizi...
详细信息
ISBN:
(纸本)9783319317441;9783319317434
Next generation sequencing produces an ever increasing amount of data, requiring increasingly fast computing infrastructures to keep up. We present GNATY, a collection of tools for NGS data analysis, aimed at optimizing parts of the sequence analysis process to reduce the hardware requirements. The tools are developed with efficiency in mind, using multithreading and other techniques to speed up the analysis. The architecture has been verified by implementing a variant caller based on the Varscan 2 variant calling model, achieving a speedup of nearly 18 times. Additionally, the flexibility of the algorithm is also demonstrated by applying it to coverage analysis. Compared to BEDtools 2 the same analysis results were found but in only half the time by GNATY. The speed increase allows for a faster data analysis and more flexibility to analyse the same sample using multiple settings.
暂无评论