版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Strasbourg CNRS UMR7104 INSERM U1258 Inst Genet Biol Moleculaire&Cellulaire IGBMC 1 Rue Laurent Fries F-67404 Illkirch Graffenstaden France Univ Strasbourg CNRS INSERM CELPHEDIAPHENOMIN Inst Clin Souris ICS 1 Rue Laurent Fries F-67404 Illkirch Graffenstaden France Univ Miami Miller Sch Med John P Hussman Inst Human Genom Miami FL 33136 USA
出 版 物:《BMC BIOINFORMATICS》 (英国医学委员会:生物信息)
年 卷 期:2023年第24卷第1期
页 面:1-18页
核心收录:
学科分类:0710[理学-生物学] 0836[工学-生物工程] 10[医学]
基 金:National Centre for Scientific Research (CNRS) French National Institute of Health and Medical Research (INSERM University of Strasbourg (Unistra) French government [ANR-10-IDEX-0002] SFRI-STRAT'US project [ANR 20-SFRI-0012] EUR IMCBio [ANR-17-EURE-0023] INBS PHENOMIN [ANR-10-INBS-07] Joint Programming Initiative Neurodegenerative Diseases (JPND) [ANR-17-JPCD-0003] European Union Agence Nationale de la Recherche (ANR) [ANR-17-JPCD-0003] Funding Source: Agence Nationale de la Recherche (ANR)
主 题:R package Phenotypic data Clinical data Discrimination Generalized linear models Random forest Imputation Model Prediction Machine learning Bootstrapping
摘 要:Background: In individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for ***: We present Gdaphen, a fast joint-pipeline allowing the identification of most important qualitative and quantitative predictor variables to discriminate between genotypes, treatments, or sex. Gdaphen takes as input behavioral/clinical data and uses a Multiple Factor Analysis (MFA) to deal with groups of variables recorded from the same individuals or anonymize genotype-based recordings. Gdaphen uses as optimized input the non-correlated variables with 30% correlation or higher on the MFA-Principal Component Analysis (PCA), increasing the discriminative power and the classifier s predictive model efficiency. Gdaphen can determine the strongest variables that predict gene dosage effects thanks to the General Linear Model (GLM)-based classifiers or determine the most discriminative not linear distributed variables thanks to Random Forest (RF) implementation. Moreover, Gdaphen provides the efficacy of each classifier and several visualization options to fully understand and support the results as easily readable plots ready to be included in publications. We demonstrate Gdaphen capabilities on several datasets and provide easily followable ***: Gdaphen makes the analysis of phenotypic data much easier for medical or preclinical be