An important approach to dynamic prediction of time-to-event outcomes using longitudinal data is based on modeling the joint distribution of longitudinal and time-to-event data. The widely used joint model for this pu...
详细信息
An important approach to dynamic prediction of time-to-event outcomes using longitudinal data is based on modeling the joint distribution of longitudinal and time-to-event data. The widely used joint model for this purpose is the shared random effect model. Presumably, adding more longitudinal predictors improves the predictive accuracy. However, the shared random effect model can be computationally difficult or prohibitive when a large number of longitudinal variables are used. In this paper, we study an alternative way of modeling the joint distribution of longitudinal and time-to-event data. Under this formulation, the log-likelihood involves no more than one-dimensional integration, regardless of the number of longitudinal variables in the model. Therefore, this model is particularly suitable in dynamic prediction problems with large number of longitudinal predictors. The model fitting can be implemented with tractable and stable computation by using a combination of pseudo maximum likelihood estimation, Expectation-Maximization algorithm, and convex optimization. We evaluate the proposed methodology and its predictive accuracy with varying number of longitudinal variables using simulations and data from a primary biliary cirrhosis study.
We introduce a new approach to a linear-circular regression problem that relates multiple linear predictors to a circular response. We follow a modelling approach of a wrapped normal distribution that describes angula...
详细信息
We introduce a new approach to a linear-circular regression problem that relates multiple linear predictors to a circular response. We follow a modelling approach of a wrapped normal distribution that describes angular variables and angular distributions and advances them for a linear-circular regression analysis. Some previous works model a circular variable as projection of a bivariate Gaussian random vector on the unit square, and the statistical inference of the resulting model involves complicated sampling steps. The proposed model treats circular responses as the result of the modulo operation on unobserved linear responses. The resulting model is a mixture of multiple linear-linear regression models. We present two em algorithms for maximum likelihood estimation of the mixture model, one for a parametric model and another for a nonparametric model. The estimation algorithms provide a great trade-off between computation and estimation accuracy, which was numerically shown using five numerical examples. The proposed approach was applied to a problem of estimating wind directions that typically exhibit complex patterns with large variation and circularity.
Nonlinear mixed-effects (NLME) models are commonly used in longitudinal studies such as pharmacokinetics and HIV viral dynamics studies. NLME models are often derived based on underlying data-generating mechanisms, th...
详细信息
Nonlinear mixed-effects (NLME) models are commonly used in longitudinal studies such as pharmacokinetics and HIV viral dynamics studies. NLME models are often derived based on underlying data-generating mechanisms, therefore the parameters in these models often have natural physical interpretations that may suggest reasonable constraints on certain parameters. For example, the HIV viral decay rates for populations receiving anti-HIV treatments may be reasonably expected to be nonnegative. Hypothesis testing for these parameters should incorporate practically reasonable constraints to increase statistical power. Motivated from HIV viral dynamic models, in this article we propose multiparameter one-sided or constrained tests for NLME models with censored responses, for example, viral dynamic models with viral loads subject to lower detection limits. We propose approximate likelihood-based tests that are computationally efficient. We evaluate the tests via simulations and show that the proposed tests are more powerful than the corresponding two-sided or unrestricted tests. We apply the proposed tests to two AIDS datasets with new findings.
Arbitrarily censored data are referred to as the survival data that contain a mixture of exactly observed, left-censored, interval-censored, and right-censored observations. Existing research work on regression analys...
详细信息
Arbitrarily censored data are referred to as the survival data that contain a mixture of exactly observed, left-censored, interval-censored, and right-censored observations. Existing research work on regression analysis on arbitrarily censored data is relatively sparse and mainly focused on the proportional hazards model and the accelerated failure time model. This article studies the proportional odds (PO) model and proposes a novel estimation approach through an expectation-maximization (em) algorithm for analyzing such data. The proposed em algorithm has many appealing properties such as being robust to initial values, easy to implement, converging fast, and providing the variance estimate of the regression parameter estimate in closed form. An informal diagnosis plot is developed for checking the PO model assumption. Our method has shown excellent performance in estimating the regression parameters as well as the baseline survival function in a simulation study. A real-life dataset about metastatic colorectal cancer is analyzed for illustration. An R package regPO has been created for practitioners to implement our method.
Cure rate models have been thoroughly investigated across various domains, encompassing medicine, reliability, and finance. The merging of machine learning (ML) with cure models is emerging as a promising strategy to ...
详细信息
Cure rate models have been thoroughly investigated across various domains, encompassing medicine, reliability, and finance. The merging of machine learning (ML) with cure models is emerging as a promising strategy to improve predictive accuracy and gain profound insights into the underlying mechanisms influencing the probability of cure. The current body of literature has explored the benefits of incorporating a single ML algorithm with cure models. However, there is a notable absence of a comprehensive study that compares the performances of various ML algorithms in this context. This paper seeks to address and bridge this gap. Specifically, we focus on the well-known mixture cure model and examine the incorporation of five distinct ML algorithms: extreme gradient boosting, neural networks, support vector machines, random forests, and decision trees. To bolster the robustness of our comparison, we also include cure models with logistic and spline-based regression. For parameter estimation, we formulate an expectation maximization algorithm. A comprehensive simulation study is conducted across diverse scenarios to compare various models based on the accuracy and precision of estimates for different quantities of interest, along with the predictive accuracy of cure. The results derived from both the simulation study, as well as the analysis of real cutaneous melanoma data, indicate that the incorporation of ML models into cure model provides a beneficial contribution to the ongoing endeavors aimed at improving the accuracy of cure rate estimation.
Healthcare outcomes such as blood pressure and heart rate are commonly tracked across time owing to technological advances in wearable devices. This advance then makes it possible to predict health risks and to practi...
详细信息
Healthcare outcomes such as blood pressure and heart rate are commonly tracked across time owing to technological advances in wearable devices. This advance then makes it possible to predict health risks and to practice personalized medicine. For this type of healthcare data, it is important to reflect huge variation among subjects where the subject becomes an experimental unit. The person-specific model becomes critical for accurate prediction, but it is not optimal due to the noisy nature of the data. It has been demonstrated that sharing information across subjects via a mixed effect model can improve the prediction of individual responses compared to a completely personalized model. However, sharing information across all patients can dilute signals when there are several different patterns present in the data. That is, subjects may form groups and each group behaves differently. To reflect this feature, we extend a deep mixed effect model via a mixture of deep mixed effect models. Our mixed effect model is based on Gaussian processes where the mean adopts the deep neural networks to capture flexible time trends. Our model finds a highly nonlinear trend shared among segments of patients while clustering patients with similar trends into groups. Our approach shows great performance in simulation studies as well as real data analysis, emphasizing the importance of modeling group-specific trends when making accurate predictions from healthcare time-series data.
The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, ...
详细信息
The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, dynamics and heterogeneity. The rapid increase in the number of experimentally determined structures has made comparison an effective tool for investigating protein structures. For example, it is now possible to compare structural ensembles formed by enzyme species, variants or the type of ligands bound to them. In this study, the author developed a multilevel model for estimating two covariance matrices that represent inter- and intra-ensemble variability in the Cartesian coordinate space. Principal component analysis using the two estimated covariance matrices identified the inter-/intra-enzyme variabilities, which seemed to be important for the enzyme functions, with the illustrative examples of cytochrome P450 family 2 enzymes and class A $\beta$-lactamases. In P450, in which each enzyme has its own active site of a distinct size, an active-site motion shared universally between the enzymes was captured as the first principal mode of the intra-enzyme covariance matrix. In this case, the method was useful for understanding the conformational variability after adjusting for the differences between enzyme sizes. The developed method is advantageous in small ensemble-size problems and hence promising for use in comparative studies on experimentally determined structures where ensemble sizes are smaller than those generated, for example, by molecular dynamics simulations.
Panel count data, in which the observation for each study subject consists of the number of recurrent events between successive examinations, are commonly encountered in industrial reliability testing, medical researc...
详细信息
Panel count data, in which the observation for each study subject consists of the number of recurrent events between successive examinations, are commonly encountered in industrial reliability testing, medical research and other scientific investigations. We formulate the effects of potentially time-dependent covariates on one or more types of recurrent events through nonhomogeneous Poisson processes with random effects. We employ nonparametric maximum likelihood estimation under arbitrary examination schemes, and develop a simple and stable em algorithm. We show that the resulting estimators of the regression parameters are consistent and asymptotically normal, with a covariance matrix that achieves the semiparametric efficiency bound and can be estimated using profile likelihood. We evaluate the performance of the proposed methods through simulation studies and analysis of data from a skin cancer clinical trial.
Promotion time cure rate models (PCM) are often used to model the survival data with a cure fraction. Medical images or biomarkers derived from medical images can be the key predictors in survival models. However, inc...
详细信息
Promotion time cure rate models (PCM) are often used to model the survival data with a cure fraction. Medical images or biomarkers derived from medical images can be the key predictors in survival models. However, incorporating images in the PCM is challenging using traditional nonparametric methods such as splines. We propose to use neural network to model the nonparametric or unstructured predictors' effect in the PCM context. Expectation-maximization algorithm with neural network for the M-step is used for parameter estimation. Asymptotic properties of the proposed estimates are derived. Simulation studies show good performance in terms of both prediction and estimation. We finally apply our methods to analyze the brain images from open access series of imaging studies data.
Variance regression allows for heterogeneous variance, or heteroscedasticity, by incorporating a regression model into the variance. This paper uses a variant of the expectation-maximisation algorithm to develop a new...
详细信息
Variance regression allows for heterogeneous variance, or heteroscedasticity, by incorporating a regression model into the variance. This paper uses a variant of the expectation-maximisation algorithm to develop a new method for fitting additive variance regression models that allow for regression in both the mean and the variance. The algorithm is easily extended to allow for B-spline bases, thus allowing for the incorporation of a semi-parametric model in both the mean and variance. Although there are existing methods to fit these types of models, this new algorithm provides a reliable alternative approach that is not susceptible to numerical instability that can arise in this constrained estimation context. We utilise the developed algorithm with a series of simulation studies and analyse illustrative data. Various simulation studies show that the algorithm can recover the true model for a variety of scenarios. We also study automatic selection of model complexity based on information-based criteria, and show that the Akaike information criterion is useful for choosing the optimal number of knots in a B-spline model. An R package is available for implementing these methods.
暂无评论