Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype -phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger ev...
详细信息
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype -phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single -nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four -state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
Logistic regression models are widely used in case-control data analysis, and testing the goodness-of-fit of their parametric model assumption is a fundamental research problem. In this article, we propose to enhance ...
详细信息
Logistic regression models are widely used in case-control data analysis, and testing the goodness-of-fit of their parametric model assumption is a fundamental research problem. In this article, we propose to enhance the power of the goodness-of-fit test by exploiting a monotonic density ratio model, in which the ratio of case and control densities is assumed to be a monotone function. We show that such a monotonic density ratio model is naturally induced by the retrospective case-control sampling design under the alternative hypothesis. The pool-adjacent-violator algorithm is adapted to solve for the constrained nonparametric maximum likelihood estimator under the alternative hypothesis. By measuring the discrepancy between this estimator and the semiparametric maximum likelihood estimator under the null hypothesis, we develop a new Kolmogorov-Smirnov-type statistic to test the goodness-of-fit for logistic regression models with case-control data. A bootstrap resampling procedure is suggested to approximate the p$$ p $$-value of the proposed test. Simulation results show that the type I error of the proposed test is well controlled and the power improvement is substantial in many cases. Three real data applications are also included for illustration.
暂无评论