Motivation: We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching m...
Motivation: We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching method to identify possible candidates and, second, to assign a confidence to the putative candidates in order to select the best one. For searching methods where the score distributions are known, p-values are used as confidence measure with great success. For the cases where such theoretical backing is absent, we propose empirical approximations to p-values for searching procedures. Results: As a baseline, we review the performances of different methods for detecting remote protein folds (sequence alignment and threading, with and without sequence profiles, global and local). The analysis is performed on a large representative set of protein structures. For fold recognition, we find that methods using sequence profiles generally perform better than methods using plain sequences, and that threading methods perform better than sequence alignment methods. In order to assess the quality of the predictions made, we establish and compare several confidence measures, including raw scores, Z-scores, raw score gaps, z-score gaps, and different methods of p-value estimation. We work our way from the theoretically well backed local scores towards more explorative global and threading scores. The methods for assessing the statistical significance of predictions are compared using specificity-sensitivity plots. For local alignment techniques we find that p-value methods work best, albeit computationally cheaper methods such as those based on score gaps achieve similar performance. For global methods where no theory is available methods based on score gaps work best. By using the score gap functions as the measure of confidence we improve the more powerful fold recognition methods for which p-values are unavailable.
Background: In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and or...
详细信息
Background: In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. Results: In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to perform a soft clustering as well as a hard clustering. The software is freely available on GitHub. Conclusions: The presented integer linear programming as well as the greedy approach for this NP-complete problem lead to valuable results on random instances and some real-world data for different similarity measures. We could show that PS-Document Clustering is a remarkable approach to document clustering and opens the complete toolbox of graph theory to this field.
In this contribution we introduce the technical concept and implementation details concerning the front end of our force-field optimization workflow package for intramolecular degrees of freedom, called Wolf(2)Pack. T...
详细信息
In this contribution we introduce the technical concept and implementation details concerning the front end of our force-field optimization workflow package for intramolecular degrees of freedom, called Wolf(2)Pack. The package's design follows our belief that parameter optimization should be a user-driven, but program guided, workflow with specific modular tasks that reduce human errors and save time. Through this design, parameter optimization becomes more reliable and reproducible. Wolf(2)Pack can integrate common force fields from different research areas, allowing the user to optimize balanced parameters;alternatively users can develop highly specialized force fields that suite their chemical systems. Included in the package's front end is a force-field and molecular database whose contents facilitate parameter optimization. Wolf2Pack can be accessed at ***.
This article is a short summary and explanation of the scientific work of Carl Adam Petri. The very basics of net theory are sufficient to understand it.
This article is a short summary and explanation of the scientific work of Carl Adam Petri. The very basics of net theory are sufficient to understand it.
Decompositions of linear ordinary differential equations (ode's) into components of lower order have successfully been employed for determining their solutions. Here this approach is generalized to nonlinear ode...
详细信息
Decompositions of linear ordinary differential equations (ode's) into components of lower order have successfully been employed for determining their solutions. Here this approach is generalized to nonlinear ode's. It is not based on the existence of Lie symmetries, in that it is a genuine extension of the usual solution algorithms. If an equation allows a Lie symmetry, the proposed decompositions are usually more efficient and often lead to simpler expressions for the solution. For the vast majority of equations without a Lie symmetry decomposition is the only available systematic solution procedure. Criteria for the existence of diverse decomposition types and algorithms for applying them are discussed in detail and many examples are given. The collection of Kamke of solved equations, and a tremendeous compilation of random equations are applied as a benchmark test for comparison of various solution procedures. Extensions of these proceedings for more general types of ode's and also partial differential equations are suggested.
FlexX-Pharm, an extended version of the flexible docking tool FlexX, allows the incorporation of information about important characteristics of protein-ligand binding modes into a docking calculation. This information...
详细信息
FlexX-Pharm, an extended version of the flexible docking tool FlexX, allows the incorporation of information about important characteristics of protein-ligand binding modes into a docking calculation. This information is introduced as a simple set of constraints derived from receptor-based type pharmacophore features. The constraints are determined by selected FlexX interactions and inclusion volumes in the receptor active site. They guide the docking process to produce a set of docking solutions with particular properties. By applying a series of look-ahead checks during the flexible construction of ligand fragments within the active site, FlexX-Pharm determines which partially built docking solutions can potentially obey the constraints. Solutions that will not obey the constraints are deleted as early as possible, often decreasing the calculation time and enabling new docking solutions to emerge. FlexX-Pharm was evaluated on various individual protein-ligand complexes where the top docking solutions generated by FlexX had high root mean square deviations (RMSD) from the experimentally observed binding modes. FlexX-Pharm showed an improvement in the RMSD of the top solutions in most cases, along with a reduction in run time. We also tested FlexX-Pharm as a database screening tool on a small dataset of molecules for three target proteins. In two cases, FlexX-Pharm missed one or two of the active molecules due to the constraints selected. However, in general FlexX-Pharm maintained or improved the enrichment shown with FlexX, while completing the screen in considerably less run time.
Four generic methods for quantile estimation have been compared: Monte Carlo (MC), Monte Carlo with Harrel-Davis weighting (WMC), quasi-Monte Carlo with Sobol sequence (QMC) and quasi-random splines (QRS). The methods...
详细信息
ISBN:
(纸本)9783319114569
Four generic methods for quantile estimation have been compared: Monte Carlo (MC), Monte Carlo with Harrel-Davis weighting (WMC), quasi-Monte Carlo with Sobol sequence (QMC) and quasi-random splines (QRS). The methods are combined with RBF metamodel and applied to the analysis of morphodynamic-hydrodynamic simulations of the river bed evolution. The following results have been obtained. Harrel-Davis weighting gives a moderate 10-20 % improvement of precision at small number of samples N similar to 100. Quasi-Monte Carlo methods provide significant improvement of quantile precision, e.g. the number of function evaluations necessary to achieve rms similar to 10(-4) precision is reduced from 1,000,000 for MC to 100,000 for QMC and to 6,000 for QRS. On the other hand, RBF metamodeling of bulky data allows to speed up the computation of one complete result in the considered problem from 45 min (on 32CPU) to 20 s (on 1CPU), providing rapid quantile estimation for the whole set of bulky data.
One of the important tasks in Mechanical Engineering is to increase the safety of the vehicle and decrease its production costs. This task is typically solved by means of Multiobjective Optimization, which formulates ...
详细信息
One of the important tasks in Mechanical Engineering is to increase the safety of the vehicle and decrease its production costs. This task is typically solved by means of Multiobjective Optimization, which formulates the problem as a mapping from the space of design variables to the space of target criteria and tries to find an optimal region in these multidimensional spaces. Due to high computational costs of numerical simulations, the sampling of this mapping is usually very sparse and scattered. Combining design of experiments methods, metamodeling, new interpolation schemes and innovative graphics methods, we enable the user to interact with simulation parameters, optimization criteria, and come to a new interpolated crash result within seconds. We denote this approach as Simulated Reality, a new concept for the interplay between simulation, optimization and interactive visualization. In this paper we show the application of Simulated Reality for solution of real life car design optimization problems.
In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. ...
详细信息
ISBN:
(纸本)9788394625375
In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. In this paper we present a new graph theoretical approach to document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to make a soft clustering as well as a hard clustering. We will present an integer linear programming and a greedy approach for this NP-complete problem and discuss some results on random instances and some real world data for different similarity measures.
In this paper we suggest novel systematization of Information Retrieval and Natural Language Processing problems. Using this rather general description of problems we are able to discuss and proof the equivalence of s...
详细信息
ISBN:
(纸本)9788394941956
In this paper we suggest novel systematization of Information Retrieval and Natural Language Processing problems. Using this rather general description of problems we are able to discuss and proof the equivalence of some problems. We provide reformulations of well-known problems like Named Entity Recognition using our novel description and discuss further research and the expected outcome. We will discuss the relation of two problems, cluster labeling and search query finding. With these results we are able to provide a novel optimization approach to both problems. This novel systematization approach provides a yet unknown view generating new classes of problems in NLP. It brings application and algorithmic approaches together and offers a better description with concepts of theoretical computer science.
暂无评论