Graphical network inference is used in many fields such as genomics or ecology to infer the conditional independence structure between variables, from measurements of gene expression or species abundances for instance...
详细信息
Graphical network inference is used in many fields such as genomics or ecology to infer the conditional independence structure between variables, from measurements of gene expression or species abundances for instance. In many practical cases, not all variables involved in the network have been observed, and the samples are actually drawn from a distribution where some variables have been marginalized out. This challenges the sparsity assumption commonly made in graphical model inference, since marginalization yields locally dense structures, even when the original network is sparse. We present a procedure for inferring Gaussian graphical models when some variables are unobserved, that accounts both for the influence of missing variables and the low density of the original network. Our model is based on the aggregation of spanning trees, and the estimation procedure on the expectation-maximization algorithm. We treat the graph structure and the unobserved nodes as missing variables and compute posterior probabilities of edge appearance. To provide a complete methodology, we also propose several model selection criteria to estimate the number of missing nodes. A simulation study and an illustration on flow cytometry data reveal that our method has favourable edge detection properties compared to existing graph inference techniques. The methods are implemented in an R package.
In recent years, longitudinal neuroimaging study has become increasingly popular in neuroscience research to investigate disease-related changes in brain functions, to study neurodevelopment or to evaluate treatment e...
详细信息
In recent years, longitudinal neuroimaging study has become increasingly popular in neuroscience research to investigate disease-related changes in brain functions, to study neurodevelopment or to evaluate treatment effects on neural processing. One of the important goals in longitudinal imaging analysis is to study changes in brain functional networks across time and how the changes are modulated by subjects' clinical or demographic variables. In current neuroscience literature, one of the most commonly used tools to extract and characterize brain functional networks is independent component analysis (ICA), which separates multivariate signals into linear mixture of independent components. However, existing ICA methods are only applicable to cross-sectional studies and not suited for modeling repeatedly measured imaging data. In this paper, we propose a novel longitudinal independent component model (L-ICA) which provides a formal modeling framework for extending ICA to longitudinal studies. By incorporating subject-specific random effects and visit-specific covariate effects, L-ICA is able to provide more accurate estimates of changes in brain functional networks on both the population- and individual-level, borrow information across repeated scans within the same subject to increase statistical power in detecting covariate effects on the networks, and allow for model-based prediction for brain networks changes caused by disease progression, treatment or neurodevelopment. We develop a fully traceable exact em algorithm to obtain maximum likelihood estimates of L-ICA. We further develop a subspace-based approximate em algorithm which greatly reduce the computation time while still retaining high accuracy. Moreover, we present a statistical testing procedure for examining covariate effects on brain network changes. Simulation results demonstrate the advantages of our proposed methods. We apply L-ICA to ADNI2 study to investigate changes in brain functional networks
Mixtures of factor analyzers is a useful model-based clustering method which can avoid the curse of dimensionality in high-dimensional clustering. However, this approach is sensitive to both diverse non-normalities of...
详细信息
Mixtures of factor analyzers is a useful model-based clustering method which can avoid the curse of dimensionality in high-dimensional clustering. However, this approach is sensitive to both diverse non-normalities of marginal variables and outliers, which are commonly observed in multivariate experiments. We propose mixtures of Gaussian copula factor analyzers (MGCFA) for clustering high-dimensional clustering. This model has two advantages;(1) it allows different marginal distributions to facilitate fitting flexibility of the mixture model, (2) it can avoid the curse of dimensionality by embedding the factor-analytic structure in the component-correlation matrices of the mixture distribution. An em algorithm is developed for the fitting of MGCFA. The proposed method is free of the curse of dimensionality and allows any parametric marginal distribution which fits best to the data. It is applied to both synthetic data and a microarray gene expression data for clustering and shows its better performance over several existing methods. (C) 2018 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
The generalized half-normal (GHN) distribution and progressive type-II censoring are considered in this article for studying some statistical inferences of constant-stress accelerated life testing. The em algorithm is...
详细信息
The generalized half-normal (GHN) distribution and progressive type-II censoring are considered in this article for studying some statistical inferences of constant-stress accelerated life testing. The em algorithm is considered to calculate the maximum likelihood estimates. Fisher information matrix is formed depending on the missing information law and it is utilized for structuring the asymptomatic confidence intervals. Further, interval estimation is discussed through bootstrap intervals. The Tierney and Kadane method, importance sampling procedure and Metropolis-Hastings algorithm are utilized to compute Bayesian estimates. Furthermore, predictive estimates for censored data and the related prediction intervals are obtained. We consider three optimality criteria to find out the optimal stress level. A real data set is used to illustrate the importance of GHN distribution as an alternative lifetime model for well-known distributions. Finally, a simulation study is provided with discussion.
Case-cohort and nested case-control designs are widely used strategies to reduce costs of covariate measurements in epidemiological cohort studies. A unified likelihood framework for two cohort designs is constructed ...
详细信息
Case-cohort and nested case-control designs are widely used strategies to reduce costs of covariate measurements in epidemiological cohort studies. A unified likelihood framework for two cohort designs is constructed and two statistical procedures are presented for making inference about the effects of incomplete covariates on the cumulative incidence of clinical event time. A pseudo-maximum likelihood estimation based on the sieve method is developed for the semiparametric non-mixture cure model, which can handle missing covariates and a cure fraction occurring in censored survival data. The resulting estimators are shown to be consistent and asymptotically normal in both case-cohort and nested case-control studies. In addition, for two cohort designs, an expectation-maximization (em) algorithm is developed to simplify the maximization of the likelihood function with the Bernstein-based smoothing technique. Such a procedure would allow one to estimate the nonparametric component of the semiparametric model in closed form and relieve the computational burden. Simulation studies demonstrate that the proposed estimators have good properties in practical situations, and a motivating application to real data is provided to illustrate the methodology. (C) 2019 Elsevier B.V. All rights reserved.
We propose a Bayesian procedure for simultaneous variable and covariance selection using continuous spike-and-slab priors in multivariate linear regression models where q possibly correlated responses are regressed on...
详细信息
We propose a Bayesian procedure for simultaneous variable and covariance selection using continuous spike-and-slab priors in multivariate linear regression models where q possibly correlated responses are regressed onto p predictors. Rather than relying on a stochastic search through the high-dimensional model space, we develop an ECM algorithm similar to the emVS procedure of Rockova and George targeting modal estimates of the matrix of regression coefficients and residual precision matrix. Varying the scale of the continuous spike densities facilitates dynamic posterior exploration and allows us to filter out negligible regression coefficients and partial covariances gradually. Our method is seen to substantially outperform regularization competitors on simulated data. We demonstrate our method with a re-examination of data from a recent observational study of the effect of playing high school football on several later-life cognition, psychological, and socio-economic outcomes. An R package, scripts for replicating examples in this article, and results from further simulation studies are provided in the available online.
Joint modelling skewness and heterogeneity is challenging in data analysis, particularly in regression analysis which allows a random probability distribution to change flexibly with covariates. This paper, based on a...
详细信息
Joint modelling skewness and heterogeneity is challenging in data analysis, particularly in regression analysis which allows a random probability distribution to change flexibly with covariates. This paper, based on a skew Laplace normal (SLN) mixture of location, scale, and skewness, introduces a new regression model which provides a flexible modelling of location, scale and skewness parameters simultaneously. The maximum likelihood (ML) estimators of all parameters of the proposed model via the expectation-maximization (em) algorithm as well as their asymptotic properties are derived. Numerical analyses via a simulation study and a real data example are used to illustrate the performance of the proposed model.
The Markovian Arrival Process (MAP) is applied as a candidate model to describe the time-varying earthquake activity in Corinth Gulf, Greece. To the best of our knowledge, this is the first attempt to study the earthq...
详细信息
The Markovian Arrival Process (MAP) is applied as a candidate model to describe the time-varying earthquake activity in Corinth Gulf, Greece. To the best of our knowledge, this is the first attempt to study the earthquake temporal evolution with the specific class of MAPs. A complete catalogue is used for the earthquake temporal distribution investigation, along with data sets of different magnitude cutoffs. The study area is divided into its western and eastern subareas, and possible variations in the earthquake occurrence times were sought. Hidden states of MAPs correspond to different levels of seismicity, and hence various numbers of states are examined. Akaike and Bayes information criteria are implemented for identifying the best model, and comparison to the most known and broadly accepted theoretical interevent time distributions is provided. In all cases, the fitted MAPs with phase type distributed intearrival times outperform the models with other distributions. Important indicators of the underlying Markov process are computed, and the earthquake frequency is approximated by the counting process. The analysis demonstrates high index of burstiness for the earthquake generation in the eastern part, i.e. long quiescent periods alternate with short ones of intense seismic activity.
The additive hazards model is one of the most commonly used model in regression analysis of failure time data and many estimation procedures have been developed for its inference under various situations (Kalbfleisch ...
详细信息
The additive hazards model is one of the most commonly used model in regression analysis of failure time data and many estimation procedures have been developed for its inference under various situations (Kalbfleisch and Prentice (2002);Lin and Ying (1994);Sun (2006)). In this paper, we consider a situation, case K interval-censored data with informative interval censoring, that often occurs in practice such as medical follow-up studies but has not been discussed much in the literature due to the difficulties involved. For the problem, a joint model is proposed to describe the correlation between the failure time of interest and the underlying censoring or observation process and a sieve maximum likelihood approach is developed. In particular, an em algorithm is presented for the implementation of the proposed estimation procedure and the asymptotic properties of the resulting estimators are established. A simulation study is conducted to assess the finite sample performance of the proposed method and suggests that it works well for practical situations. Also the method is applied to an AIDS study that motivated this study. (C) 2019 Elsevier B.V. All rights reserved.
In this paper, we present a regression model where the response variable is a count data that follows a Waring distribution. The Waring regression model allows for analysis of phenomena where the Geometric regression ...
详细信息
In this paper, we present a regression model where the response variable is a count data that follows a Waring distribution. The Waring regression model allows for analysis of phenomena where the Geometric regression model is inadequate, because the probability of success on each trial, p, is different for each individual and p has an associated distribution. Estimation is performed by maximum likelihood, through the maximization of the Q-function using em algorithm. Diagnostic measures are calculated for this model. To illustrate the results, an application to real data is presented. Some specific details are given in the Appendix of the paper.
暂无评论