the availability of large-scale genomic and transcriptomic data on populations makes it necessary to perform computationally intensive expression quantitative trait locus (eQTL) analysis. Modeling in a sparse learning...
详细信息
ISBN:
(纸本)9781450338530
the availability of large-scale genomic and transcriptomic data on populations makes it necessary to perform computationally intensive expression quantitative trait locus (eQTL) analysis. Modeling in a sparse learning framework, LASSO based tools are powerful for eQTL analysis. However, classical LASSO becomes limited for big genomic data. We thus propose two novel methods, namely sequential LASSO and parallel LASSO, to conduct eQTL analysis for datasets of ultra-high dimension. We theoretically prove the consistency of our methods under mild conditions and perform extensive simulations on synthetic data to validate our methods. We also apply our methods to a real human genomics database demonstrate the application of our method. Copyright is held by the author/owner(s). Copyright is held by the author/owner(s).
the problem of computing X-minimal models, that is, models minimal with respect to a subset X of all the atoms in a theory, is very relevant for computing circumscriptions and diagnosis. Unfortunately, the problem is ...
详细信息
the proceedings contain 44 papers. the special focus in this conference is on Critical Systems, Rigorous Engineering of Autonomic Ensembles, Automata Learning, Formal methods and Analysis, Model-Based Code Generators ...
ISBN:
(纸本)9783662452332
the proceedings contain 44 papers. the special focus in this conference is on Critical Systems, Rigorous Engineering of Autonomic Ensembles, Automata Learning, Formal methods and Analysis, Model-Based Code Generators and Automata Learning in Practice. the topics include: Statistical abstraction boosts design and test efficiency of evolving critical systems;incremental syntactic-semantic reliability analysis of evolving structured workflows;domain-specific languages for enterprise systems;formalizing self-adaptive clouds with knowlang;towards performance-aware engineering of autonomic component ensembles;rigorous system design flow for autonomous systems;algorithms for inferring register automata;active learning of nondeterministic systems from an ioco perspective;fomal methods and analyses in software product line engineering;domain specific languages for managing feature models;deployment variability in delta-oriented models;coverage criteria for behavioural testing of software product lines;DSL implementation for model-based development of pumps;domain-specific code generator modeling;LNCS transactions on foundations for mastering change;formal methods for collective adaptive ensembles;current issues on model-based software quality assurance for mastering change and compositional model-based system design as a foundation for mastering change.
3D medical image segmentation is needed for diagnosis and treatment. As manual segmentation is very costly, automatic segmentation algorithms are needed. For finding best algorithms, several algorithms need to be eval...
详细信息
Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) com...
详细信息
ISBN:
(纸本)9781450338530
Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Popular criteria include root mean squared deviation (RMSD), MaxSub score, TM-score, GDT-TS and GDT-HA scores. All these criteria require calculation of rigid transformations to superimpose the the predicted protein structure to the native protein structure. Yet, how to obtain the rigid transformations is unknown or with high time complexity, and, hence, heuristic algorithms were proposed. In this work, we carefully design various small structure patterns, including the ones specifically tuned for local pockets. Such structure patterns are biologically meaningful, and address the issue of relying on a sufficient number of backbone residue fragments for existing methods. We sample the rigid transformations from these small structure patterns;and the optimal superpositions yield by these small structures are refined and reported. As a result, among 11;669 pairs of predicted and native local protein pocket models from the CASP10 dataset, the GDT-TS scores calculated by our method are significantly higher than those calculated by LGA. Moreover, our program is computationally much more efficient. Source codes and executables are publicly available at http://***/prosta/. Copyright is held by the author/owner(s).
the proceedings contain 116 papers. the topics discussed include: rough set approach under dynamic granulation in incomplete information systems;generalized fuzzy operations for digital hardware implementation;a novel...
详细信息
ISBN:
(纸本)9783540766308
the proceedings contain 116 papers. the topics discussed include: rough set approach under dynamic granulation in incomplete information systems;generalized fuzzy operations for digital hardware implementation;a novel model of artificial immune system for solving constrained optimization problems with dynamic tolerance factor;a genetic representation for dynamic system qualitative models on genetic programming: a gene expression programming approach;handling constraints in particle swarm optimization using a small population size;collective methods on flock traffic navigation based on negotiation;a new global optimization algorithm inspired by parliamentary political competitions;discovering promising regions to help global numerical optimization algorithms;and clustering search approach for the traveling tournament problem.
the proceedings contain 10 papers. the special focus in this conference is on Medical Terminology, Clinical Processes and Machine Learning in Biomedicine. the topics include: Exploiting PubMed to answer biomedical que...
ISBN:
(纸本)9783319227405
the proceedings contain 10 papers. the special focus in this conference is on Medical Terminology, Clinical Processes and Machine Learning in Biomedicine. the topics include: Exploiting PubMed to answer biomedical questions in natural language;using twitter data and sentiment analysis to study diseases dynamics;an open data approach for clinical appropriateness;a logistic regression approach for identifying hot spots in protein interfaces;the discovery of prognosis factors using association rule mining in acute myocardial infarction with ST-segment elevation;data mining techniques in health informatics;artificial neural networks in diagnosis of liver diseases;how to increase the effectiveness of the hepatitis diagnostics by means of appropriate machine learning methods;ant-inspired algorithms for decision tree induction;microsleep classifier using EOG channel recording.
We propose a novel pattern matching algorithm for consensus nucleotide sequences over IUPAC alphabet, called BADPM (Byte-Aligned Degenerate Pattern Matching). the consensus nucleotide sequences represent a consensus o...
详细信息
ISBN:
(纸本)9789897583537
We propose a novel pattern matching algorithm for consensus nucleotide sequences over IUPAC alphabet, called BADPM (Byte-Aligned Degenerate Pattern Matching). the consensus nucleotide sequences represent a consensus obtained by sequencing a population of the same species and they are considered as so-called degenerate strings. BADPM works at the level of single bytes and it achieves sublinear search time on average. the algorithm is based on tabulating all possible factors of the searched pattern. It needs O (m + m alpha(2) logm)space data structure and O (m alpha(2)) time for preprocessing where m is a length of the pattern and a represents a maximum number of variants implied from a 4-gram over IUPAC alphabet. the worst-case locate time is bounded by O (nm(2)alpha(4)) for BADPM where n is the length of the input text. However, the experiments performed on real genomic data proved the sublinear search time. BADPM can easily cooperate withthe block q-gram inverted index and so achieve still better locate time. We implemented two other pattern matching algorithms for IUPAC nucleotide sequences as a baseline: Boyer-Moore-Horspool (BMH) and Parallel Naive Search (PNS). Especially PNS proves its efficiency insensitive to the length of the searched pattern m. BADPM proved its strong superiority for searching middle and long patterns.
Bio-PEPA is a novel stochastic process algebra which has been recently developed for modelling biochemical pathways [5,6]. In Bio-PEPA a reagent-centric style of modelling is adopted, and a variety of analysis techniq...
详细信息
Advances in sequencing technologies and computational methods have enabled rapid and accurate identification of genetic variants. Accurate genotype calls and allele frequency estimations are crucial for population gen...
详细信息
ISBN:
(纸本)9789897585524
Advances in sequencing technologies and computational methods have enabled rapid and accurate identification of genetic variants. Accurate genotype calls and allele frequency estimations are crucial for population genomics analyses. One of the most demanding step in the genotyping pipeline is mapping reads to the human reference genome. Recently mapping-free methods, like Lava and VarGeno, have been proposed for the genotyping problem. they are reported to perform 30 times faster than a standard alignment-based genotyping pipeline while achieving comparable accuracy. Moreover, these methods are able to include known genomic variants in the reference making read mapping, and genotyping variant-aware. However, in order to run they require a large k-mers database, of about 60GB, to be loaded in memory. In this paper we study the problem of genotyping using new efficient data structures based on k-mers set compression, and we present a fast mapping-free genotyping tool, named GenoLight. GenoLight reports accuracy results similar to the standard pipeline, but it is up to 8 times faster. Also, GenoLight uses between 5 to 10 times less memory than the other mapping-free tools, and it can be run on a laptop. Availability: https://***/CominLab/GenoLight.
暂无评论