A cure model is a useful approach for analysing failure time data in which some subjects could eventually experience, and others never experience, the event of interest. A cure model has two components: incidence whic...
详细信息
A cure model is a useful approach for analysing failure time data in which some subjects could eventually experience, and others never experience, the event of interest. A cure model has two components: incidence which indicates whether the event could eventually occur and latency which denotes when the event will occur given the subject is susceptible to the event. In this paper, we propose a semi-parametric cure model in which covariates can affect both the incidence and the latency. A logistic regression model is proposed for the incidence, and the latency is determined by an accelerated failure time regression model with unspecified error distribution. An em algorithm is developed to fit the model. The procedure is applied to a data set of tonsil cancer patients treated with radiation therapy. Copyright (C) 2002 John Wiley Sons, Ltd.
Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of...
详细信息
Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of the missingness is related to the outcomes. We present an approach for incomplete categorical data in the repeated measures setting that allows missing data to depend on other observed outcomes for a study subject. The proposed methodology also allows a broader examination of study findings through interpretation of results in the framework of the set of all possible test statistics that might have been observed had no data been missing. The proposed approach consists of the following general steps. First, we generate all possible sets of missing values and form a set of possible complete data sets. We then weight each data set according to clearly defined assumptions and apply an appropriate statistical test procedure to each data set, combining the results to give an overall indication of significance. We make use of the em algorithm and a Bayesian prior in this approach. While not restricted to the one-sample case, the proposed methodology is illustrated for one-sample data and compared to the common complete-case and available-case analysis methods. Copyright (C) 2002 John Wiley Sons, Ltd.
This paper reviews hierarchical observation models, used in functional neuroimaging, in a Bayesian light. It emphasizes the common ground shared by classical and Bayesian methods to show that conventional analyses of ...
详细信息
This paper reviews hierarchical observation models, used in functional neuroimaging, in a Bayesian light. It emphasizes the common ground shared by classical and Bayesian methods to show that conventional analyses of neuroimaging data can be usefully extended within an empirical Bayesian framework. In particular we formulate the procedures used in conventional data analysis in terms of hierarchical linear models and establish a connection between classical inference and parametric empirical Bayes (PEB) through covariance component estimation. This estimation is based on an expectation maximization or em algorithm. The key point is that hierarchical models not only provide for appropriate inference at the highest level but that one can revisit lower levels suitably equipped to make Bayesian inferences. Bayesian inferences eschew many of the difficulties encountered with classical inference and characterize brain responses in a way that is more directly predicated on what one is interested in. The motivation for Bayesian approaches is reviewed and the theoretical background is presented in a way that relates to conventional methods, in particular restricted maximum likelihood (RemL). This paper is a technical and theoretical prelude to subsequent papers that deal with applications of the theory to a range of important issues in neuroimaging. These issues include;(i) Estimating nonsphericity or variance components in fMRI time-series that can arise from serial correlations within subject, or are induced by multisubject (i.e., hierarchical) studies. (ii) Spatiotemporal Bayesian models for imaging data, in which voxels-specific effects are constrained by responses in other voxels. (iii) Bayesian estimation of nonlinear models of hemodynamic responses and (iv) principled ways of mixing structural and functional priors in EEG source reconstruction. Although diverse, all these estimation problems are accommodated by the PEB framework described in this paper. (C) 2002 Elsevi
A retrospective substudy of the nutritional prevention of cancer (NPC) trials investigated the utility of longitudinally measured prostate-specific antigen (PSA) as a biomarker for subsequent onset of prostate cancer ...
详细信息
A retrospective substudy of the nutritional prevention of cancer (NPC) trials investigated the utility of longitudinally measured prostate-specific antigen (PSA) as a biomarker for subsequent onset of prostate cancer (PCa). Serial PSA levels were determined retrospectively from frozen blood samples that had been collected from all patients at successive clinic visits with the timing and the number of these visits highly variable. Diagnosis dates of all incident cases of PCa were recorded. Heterogeneity in PSA trajectories was observed that could not be fully explained by the usual linear mixed-effects model and measured covariates. Latent class models that incorporate both a longitudinal blomarker process and an event process offer a way to handle additional heterogeneity, to uncover distinct subpopulations. to incorporate correlated nonnormally distributed outcomes, and to classify individuals into risk classes, Our latent class joint model can aid the prediction of PCa probability given the longitudinal biomarker information available on an individual up to any date. The proposed model easily accommodates highly unbalanced longitudinal data and recurrent events. There are two levels of structure in the latent class joint model, First, the uncertainty of latent class membership is specified through a multinomial logistic model. Second, the class-specific marker trajectory and event process are specified parametrically and semiparametrically, under the assumption of conditional independence given the latent class membership. We use a likelihood approach to obtain parameter estimates via the em algorithm. We fit the latent class joint model to the data from the NPC trials;four distinct subpopulations are identified that differ with regard to their PSA trajectories and risk for prostate cancer. Higher PSA level is significantly associated with increased risk of PCa, but appears to be conditionally independent once the latent classes are taken into account. Among the c
Consider the problem of predicting the occurrence of an event, the onset of diabetes mellitus, say, from a vector of continuous and discrete predictors. We propose a new algorithm for the construction of a tree-struct...
详细信息
Consider the problem of predicting the occurrence of an event, the onset of diabetes mellitus, say, from a vector of continuous and discrete predictors. We propose a new algorithm for the construction of a tree-structured predictor for the event of interest, which uses a new approach for dealing with continuous predictors. The novelty is that the tree uses splits for continuous variables. This means that at each node an individual goes to the right branch with a certain probability, function of a predictor. The predictor as well as the particular shape of the function is chosen from the data by the proposed algorithm. We evaluate its performance on several real data sets, in particular comparing it with a standard tree-growing algorithm. We also present an analysis of a well-known data set, the Pima Indian diabetes data set, to illustrate the application of the method in biostatistics. Copyright (C) 2002 John Wiley Sons, Ltd.
inpatient length of stay (LOS) is often considered as a proxy of hospital resource consumption. Using statewide obstetrical delivery data, a two-component Poisson mixture model provides a reasonable fit to the heterog...
详细信息
inpatient length of stay (LOS) is often considered as a proxy of hospital resource consumption. Using statewide obstetrical delivery data, a two-component Poisson mixture model provides a reasonable fit to the heterogeneous LOS distribution. Adopting the generalized linear mixed model (GLMM) approach, random effects are introduced to the two-component Poisson mixture regression model to account for the inherent correlation of patients clustered within hospitals. An em algorithm is developed for the joint estimation of regression coefficients and variance component parameters. Related diagnostic measures for assessing model adequacy are derived. When applying the method to analyse maternity LOS, appropriate risk factors for the short-stay and long-stay subgroups can be identified from the respective Poisson components. In addition, predicted random hospital effects enable the comparison of relative efficiencies among hospitals after adjustment for patient case-mix and health provision characteristics. Copyright (C) 2002 John Wiley Sons, Ltd.
It is common in the analysis of aggregate data in epidemiology that the variances of the aggregate observations are available. The analysis of such data leads to a measurement error situation, where the known variance...
详细信息
It is common in the analysis of aggregate data in epidemiology that the variances of the aggregate observations are available. The analysis of such data leads to a measurement error situation, where the known variances of the measurement errors vary between the observations. Assuming multivariate normal distribution for the 'true' observations and normal distributions for the measurement errors, we derive a simple em algorithm for obtaining maximum likelihood estimates of the parameters of the multivariate normal distributions. The results also facilitate the estimation of regression parameters between the variables as well as the 'true' values of the observations. The approach is applied to re-estimate recent results of the WHO MONICA Project on cardiovascular disease and its risk factors, where the original estimation of the regression coefficients did not adjust for the regression attenuation caused by the measurement errors. Copyright (C) 2002 John Wiley Sons, Ltd.
The main motivation of this paper is to design a statistically well justified and biologically compatible neural network model and, in particular, to suggest a theoretical interpretation of the well known high paralle...
详细信息
The main motivation of this paper is to design a statistically well justified and biologically compatible neural network model and, in particular, to suggest a theoretical interpretation of the well known high parallelism of biological neural networks. We consider a novel probabilistic approach to neural networks developed in the framework of statistical pattern recognition, and refer to a series of theoretical results published earlier. It is shown that the proposed parallel fusion of probabilistic neural networks produces biologically plausible structures and improves the resulting recognition performance. The complete design methodology based on the em algorithm has been applied to recognise unconstrained handwritten numerals from the database of Concordia University Montreal. We achieved a recognition accuracy of about 95%, which is comparable with other published results.
In recent years AIDS researchers have shown great interest in the study of HIV viral dynamics. Nonlinear mixed-effects models (NLMEs) have been proposed for modeling intrapatient and interpatient variations in viral l...
详细信息
In recent years AIDS researchers have shown great interest in the study of HIV viral dynamics. Nonlinear mixed-effects models (NLMEs) have been proposed for modeling intrapatient and interpatient variations in viral load measurements. The interpatient variation often receives great attention and may be partially explained by time-varying covariates, such as CD4 cell counts. Statistical analyses in these studies are complicated by the following problems: (a) the viral load measurements may be subject to left censoring due to a detection limit, (b) covariates are often measured with substantial errors, and (c) covariates frequently contain missing data. In this article we address these three problems simultaneously by jointly modeling the covariate and the response processes. We adapt a Monte Carlo em algorithm and a linearization procedure to estimate the model parameters. Our approach is preferable to naive methods and the two-step method in the sense that it produces less-biased estimates with more-reliable standard errors. We analyze a real AIDS dataset and show that the fitted model may provide good prediction for unobserved viral loads.
With the advent of new molecular marker technologies, it is now feasible to initiate genome projects for outcrossing plant species, which have not received much attention in genetic research, despite their great agric...
详细信息
With the advent of new molecular marker technologies, it is now feasible to initiate genome projects for outcrossing plant species, which have not received much attention in genetic research, despite their great agricultural and environmental value. Because outcrossing species typically have heterogeneous genomes, data structure for molecular markers representing an entire genome is complex: some markers may have more alleles than others, some markers are codominant whereas others are dominant, and some markers are heterozygous in one parent but fixed in the other parent whereas the opposite can be true for other markers. A major difficulty in analyzing these different types of marker at the same time arises from uncertainty about parental linkage phases over markers. In this paper, we present a general maximum-likelihood-based algorithm for simultaneously estimating linkage and linkage phases for a mixed set of different marker types containing fully informative markers (segregating 1:1:1:1) and partially informative markers (or missing markers, segregating 1:2:1, 3:1, and 1:1) in a full-sib family derived from two outbred parent plants. The characterization of linkage phases is based on the posterior probability distribution of the assignment of alternative alleles at given markers to two homologous chromosomes of each parent, conditional on the observed phenotypes of the markers. Two- and multi-point analyses are performed to estimate the recombination fraction and determine the most likely linkage phase between different types of markers. A numerical example is presented to demonstrate the statistical properties of the model for characterizing the linkage phase between markers. (C) 2002 Elsevier Science (USA).
暂无评论