Structural alignments of Ribonucleic acid (RNA) sequences solved by the sankoff algorithm are computationally expensive and often require constraints to be used in practice. Modern Graphics Processing Units (GPUs) con...
详细信息
Structural alignments of Ribonucleic acid (RNA) sequences solved by the sankoff algorithm are computationally expensive and often require constraints to be used in practice. Modern Graphics Processing Units (GPUs) contain more than 1000 cores, which compute in parallel to speed up applications. Here, we present a GPU-based solution to the RNA structural alignment problem that makes use of precalculated base pair probabilities on the individual sequences. We designed and developed an unconstrained version of the sankoff algorithm, obtaining the optimal result and calculating the entire four-dimension dynamic programming matrix (4D DP). Our approach uses a two-level wavefront strategy to exploit parallelism. The 4D DP matrix is divided in one external matrix (EM) and several internal matrices (IM). We applied wavefront strategies on the EM and IMs in a two-level hierarchical way. At the first level, the wavefront is applied to the EM, calculating the cells that belong to the same diagonal in parallel. In the second level, since each cell in the EM is itself an IM matrix, the cells that belong to the same IM diagonal are calculated in parallel. The results obtained with real RNA sequences show that our GPU version is capable of outperforming a multicore CPU version of the unconstrained version of the sankoff algorithm. Compared with the CPU-based version running on 32 cores, our approach is able to achieve a speedup of 7.81x on the NVidia Tesla P100. In this case, the execution time was reduced from 6 hours and 18 minutes (32 cores) to 48 minutes and 20 seconds (GPU).
Character state live phylogeny generalizes character state phylogeny in the sense that they relate taxonomic units based on their similarities over a set of characters, but allowing live ancestors. An approach for cha...
详细信息
ISBN:
(数字)9783030017224
ISBN:
(纸本)9783030017224;9783030017217
Character state live phylogeny generalizes character state phylogeny in the sense that they relate taxonomic units based on their similarities over a set of characters, but allowing live ancestors. An approach for character state live phylogeny reconstruction is called parsimony, where one tries to minimize the total number of character state changes along the edges of the tree. The problem of finding a tree that minimizes this number is known as large live parsimony problem. When the tree topology is also given as input, the problem is known as small live parsimony problem. We propose a genetic algorithm to solve the large live problem, which uses extended versions of the algorithms of Fitch and sankoff to solve the small live problem, both devised in this work. Besides, we performed two experiments. In the first one, a multiple alignment of H1N1 and H3N2 viruses from different countries, taken as input, allowed to obtain interesting live phylogenies, representing alternative evolutionary hypothesis. The second experiment took as input a multiple alignment of the HIV virus env gene, from one patient, read in different dates through 12 years. The generated live phylogenies were similar to the ones generated by PAUP, where dates close to each other were grouped into clusters, but suggesting new evolutionary stories.
Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This...
详细信息
Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This gives, in a precise sense, better information than an artificial amalgamation of different scores into a single objective, but is more costly to compute. Pareto optimization naturally occurs with genetic algorithms, albeit in a heuristic fashion. Non-heuristic Pareto optimization so far has been used only with a few applications in bioinformatics. We study exact Pareto optimization for two objectives in a dynamic programming framework. We define a binary Pareto product operator *(Par) on arbitrary scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme A*B-Par correctly performs Pareto optimization over the same search space. We study different implementations of the Pareto operator with respect to their asymptotic and empirical efficiency. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective. For RNA structure prediction under the minimum free energy versus the maximum expected accuracy model, we show that the empirical size of the Pareto front remains within reasonable bounds. Pareto optimization lends itself to the comparative investigation of the behavior of two alternative scoring schemes for the same purpose. For the above scoring schemes, we observe that the Pareto front can be seen as a composition of a few macrostates, each consisting of several microstates that differ in the same limited way. We also study the relationship between abstract shape analysis and the Pareto front, and find that they extract information of a different nature from the folding space and can be meaningfully combined.
Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as “RNA structural alignment.” A class of the methods for structural alignment is based on the principles proposed by S...
详细信息
Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as “RNA structural alignment.” A class of the methods for structural alignment is based on the principles proposed by sankoff more than 25 years ago. The sankoff algorithm simultaneously folds and aligns two or more sequences. The advantage of this algorithm over those that separate the folding and alignment steps is that it makes better predictions. The disadvantage is that it is slower and requires more computer memory to run. The amount of computational resources needed to run the sankoff algorithm is so high that it took more than a decade before the first implementation of a sankoff style algorithm was published. However, with the faster computers available today and the improved heuristics used in the implementations the sankoff-based methods have become practical. This chapter describes the methods based on the sankoff algorithm. All the practical implementations of the algorithm use heuristics to make them run in reasonable time and memory. These heuristics are also described in this chapter. less
An accelerated method is suggested which enables an effective comparison to be made of amino acid (nucleotide) sequences of great length with due regard to a large number of possible gaps. The method consists in limit...
详细信息
An accelerated method is suggested which enables an effective comparison to be made of amino acid (nucleotide) sequences of great length with due regard to a large number of possible gaps. The method consists in limiting the area of complete similarity charts, calculated in accordance with the algorithm suggested by sankoff, by a certain specially selected diagonal band. The application of the Monte-Carlo method permits a statistical evaluation to be made of the certainty of the similarity of the compared sequences and to choose on such a comparison band, an optimum correspondence path which can readily be transformed into sequence alignment. Using this approach, prolactin and somatotropin families of sequences were found to be homologous at a high level of significance and their optimum alignment with 2 gaps was suggested. In contrast, 2 regions of assumed partial gene duplication in .beta.-galactosidase sequence, suggested by Hood et al, were not statistically significantly similar.
暂无评论