Mendelian randomization (MR) is a popular method in epidemiology and genetics that uses genetic variation as instrumental variables for causal inference. Existing MR methods usually assume most genetic variants are va...
详细信息
Mendelian randomization (MR) is a popular method in epidemiology and genetics that uses genetic variation as instrumental variables for causal inference. Existing MR methods usually assume most genetic variants are valid instrumental variables that identify a common causal effect. There is a general lack of awareness that this effect homogeneity assumption can be violated when there are multiple causal pathways involved, even if all the instrumental variables are valid. In this article we introduce a latent mixture model MR -Path that groups instruments that yield similar causal effect estimates together. We develop a Monte Carlo em algorithm to fit this mixture model, derive approximate confidence intervals for uncertainty quantification, and adopt a modified Bayesian Information Criterion (BIC) for model selection. We verify the efficacy of the Monte Carlo em algorithm, confidence intervals, and model selection criterion using numerical simulations. We identify potential mechanistic heterogeneity when applying our method to estimate the effect of high -density lipoprotein cholesterol on coronary heart disease and the effect of adiposity on type II diabetes.
The assumption of normality in random effects and regression errors is the primary cause of the lack of robustness in the maximum likelihood estimation procedure for linear mixed models. In this paper, we introduce a ...
详细信息
The assumption of normality in random effects and regression errors is the primary cause of the lack of robustness in the maximum likelihood estimation procedure for linear mixed models. In this paper, we introduce a robust method for estimating regression parameters in these models, by positing that the random effects and regression errors follow a multivariate Laplace distribution. This new methodology, implemented via an em algorithm, is computationally more efficient compared to the existing robust t procedure in the literature. Simulation studies suggest that the performance of the proposed estimation method in finite samples either surpasses or is at least on par with the robust t procedure.
Estimating hidden Markov models (HMMs) with unknown number of states is a challenging task. In this paper, we propose a new penalized composite likelihood approach for simultaneously estimating both the number of stat...
详细信息
Estimating hidden Markov models (HMMs) with unknown number of states is a challenging task. In this paper, we propose a new penalized composite likelihood approach for simultaneously estimating both the number of states and the parameters in an overfitted HMM. We prove the order selection consistency and asymptotic normality of the resultant estimator. Simulation studies and an application demonstrate the finite sample performance of the proposed method.
Survey data using categorical item variables are widely used in applied research such as psychology, education, and behavioral studies. Unfortunately, survey data are highly susceptible to nonignorable missing values ...
详细信息
Survey data using categorical item variables are widely used in applied research such as psychology, education, and behavioral studies. Unfortunately, survey data are highly susceptible to nonignorable missing values that may threaten the validity of statistical inference if naively ignored or inappropriately treated. This paper proposes a novel latent pattern mixture model for nonignorable missing values in multivariate categorical outcomes. The proposed model posits the existence of two categorical latent variables;one latent variable represents a nonresponse pattern, and the other represents a response pattern conditioning on the nonresponse pattern. We propose two parameter estimation strategies: the maximum-likelihood (ML) estimation using the expectation-maximization algorithm and Bayesian estimation using the Markov-Chain Monte Carlo algorithm. Simulation studies revealed that the ML estimation is preferred to the Bayesian estimation with noninformative priors in terms of standardized biases given the large sample size, whereas the Bayesian estimation can be preferred when the sample size is small. Finally, our real data example analyzed a data set with parental substance use disorder and revealed six latent classes of participants that are distinguished in response and missingness patterns.
Decision trees constitute a simple yet powerful and interpretable machine learning tool. While tree-based methods are designed only for cross-sectional data, we propose an approach that combines decision trees with ti...
详细信息
Decision trees constitute a simple yet powerful and interpretable machine learning tool. While tree-based methods are designed only for cross-sectional data, we propose an approach that combines decision trees with time series modeling and thereby bridges the gap between machine learning and statistics. In particular, we combine decision trees with hidden Markov models where, for any time point, an underlying (hidden) Markov chain selects the tree that generates the corresponding observation. We propose an estimation approach that is based on the expectation-maximisation algorithm and assess its feasibility in simulation experiments. In our real-data application, we use eight seasons of National Football League (NFL) data to predict play calls conditional on covariates, such as the current quarter and the score, where the model's states can be linked to the teams' strategies. R code that implements the proposed method is available on GitHub.
We consider the Bayesian estimation of the parameters of a finite mixture model from independent order statistics arising from imperfect ranked set sampling designs. As a cost-effective method, ranked set sampling ena...
详细信息
We consider the Bayesian estimation of the parameters of a finite mixture model from independent order statistics arising from imperfect ranked set sampling designs. As a cost-effective method, ranked set sampling enables us to incorporate easily attainable characteristics, as ranking information, into data collection and Bayesian estimation. To handle the special structure of the ranked set samples, we develop a Bayesian estimation approach exploiting the Expectation-Maximization (em) algorithm in estimating the ranking parameters and Metropolis within Gibbs Sampling to estimate the parameters of the underlying mixture model. Our findings show that the proposed RSS-based Bayesian estimation method outperforms the commonly used Bayesian counterpart using simple random sampling. The developed method is finally applied to estimate the bone disorder status of women aged 50 and older.
The mixture models are widely used to analyze data with cluster structures and the mixture of Gaussians is most common in practical applications. The use of mixtures involving other multivariate distributions, like th...
详细信息
The mixture models are widely used to analyze data with cluster structures and the mixture of Gaussians is most common in practical applications. The use of mixtures involving other multivariate distributions, like the multivariate skew normal and multivariate generalised hyperbolic, is also found in the literature. However, in all such cases, only the mixtures of identical distributions are used to form a mixture model. We present an innovative and versatile approach for constructing mixture models involving identical and non-identical distributions combined in all conceivable permutations (e.g. a mixture of multivariate skew normal and multivariate generalised hyperbolic). We also establish any conventional mixture model as a distinctive particular case of our proposed framework. The practical efficacy of our model is shown through its application to both simulated and real-world data sets. Our comprehensive and flexible model excels at recognising inherent patterns and accurately estimating parameters.
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype -phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger ev...
详细信息
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype -phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single -nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four -state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic param...
详细信息
For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic parameterization of the CMP has been well-studied, its main drawback is that it is does not directly model the mean of the counts. This is mitigated by using a mean-parameterized version of the CMP distribution. In this work, we are concerned with the setting where count data may be comprised of subpopulations, each possibly having varying degrees of data dispersion. Thus, we propose a finite mixture of mean-parameterized CMP distributions. An em algorithm is constructed to perform maximum likelihood estimation of the model, while bootstrapping is employed to obtain estimated standard errors. A simulation study is used to demonstrate the flexibility of the proposed mixture model relative to mixtures of Poissons and mixtures of negative binomials. An analysis of dog mortality data is presented.
We discuss the modelling of traffic count data that show the variation of traffic volume within a day. For the modelling, we apply mixtures of Kato-Jones distributions in which each component is unimodal and affords a...
详细信息
We discuss the modelling of traffic count data that show the variation of traffic volume within a day. For the modelling, we apply mixtures of Kato-Jones distributions in which each component is unimodal and affords a wide range of skewness and kurtosis. We consider two methods for parameter estimation, namely, a modified method of moments and the maximum-likelihood method. These methods were seen to be useful for fitting the proposed mixtures to our data. As a result, the variation in traffic volume was classified into the morning and evening traffic whose distributions have different shapes, particularly different degrees of skewness and kurtosis.
暂无评论