One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificit...
详细信息
One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase 11 of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://***.
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary ...
详细信息
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed-parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved, algorithm for the problem is fixed-parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and have shown them to be extremely efficient in practice on biologically significant data sets. This work proves that the BNPP problem is fixed-parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary ...
详细信息
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed-parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved, algorithm for the problem is fixed-parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and have shown them to be extremely efficient in practice on biologically significant data sets. This work proves that the BNPP problem is fixed-parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.
The Traveling Salesman Problem (TSP) is among the most famous NP-hard optimization problems. The special case of TSP in bounded-dimensional Euclidean spaces has been a particular focus of research: The celebrated resu...
详细信息
ISBN:
(纸本)9780769551357
The Traveling Salesman Problem (TSP) is among the most famous NP-hard optimization problems. The special case of TSP in bounded-dimensional Euclidean spaces has been a particular focus of research: The celebrated results of Arora [Aro98] and Mitchell [Mit99] - along with subsequent improvements of Rao and Smith [RS98] - demonstrated a polynomial time approximation scheme for this problem, ultimately achieving a runtime of O-d,O-e(n log n). In this paper, we present a linear time approximation scheme for Euclidean TSP, with runtime O-d,O-e(n). This improvement resolves a 15 year old conjecture of Rao and Smith, and matches for Euclidean spaces the bound known for a broad class of planar graphs [Kle08].
Determinant maximization problem gives a general framework that models problems arising in as diverse fields as statistics [1], convex geometry [2], fair allocations [3], combinatorics [4], spectral graph theory [5], ...
详细信息
ISBN:
(纸本)9781665455190
Determinant maximization problem gives a general framework that models problems arising in as diverse fields as statistics [1], convex geometry [2], fair allocations [3], combinatorics [4], spectral graph theory [5], network design, and random processes [6]. In an instance of a determinant maximization problem, we are given a collection of vectors U = {nu(1),..., nu(n)} subset of R-d, and a goal is to pick a subset S subset of U of given vectors to maximize the determinant of the matrix Sigma(i is an element of S) nu(i)nu(inverted perpendicular)(i). Often, the set S of picked vectors must satisfy additional combinatorial constraints such as cardinality constraint (|S| <= k) or matroid constraint ( S is a basis of a matroid defined on the vectors). In this paper, we give a polynomial-time deterministic algorithm that returns a r(O(r))-approximation for any matroid of rank r <= d. This improves previous results that give e(O(r2))-approximation algorithms relying on e(O(r))-approximate estimation algorithms [4], [7]-[9] for any r <= d. All previous results use convex relaxations and their relationship to stable polynomials and strongly log-concave polynomials or non-convex relaxations for the problem [10]. In contrast, our algorithm builds on combinatorial algorithms for matroid intersection, which iteratively improve any solution by finding an alternating negative cycle in the exchange graph defined by the matroids. While the det(.) function is not linear, we show that taking appropriate linear approximations at each iteration suffice to give the improved approximation algorithm.
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the ...
详细信息
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the MULTIPLE GENE DUPLICATION problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the MULTIPLE GENE DUPLICATION problems: the EPISODE-CLUSTERING (EC) problem and the MINIMUM EPISODES (ME) problem. For the EC problem, we improve the results of Burleigh et al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm.
暂无评论