The authors consider the estimation of a residual distribution for different measurement problems with a common measurement error process. The problem is motivated by issues arising in the analysis of gene expression ...
详细信息
The authors consider the estimation of a residual distribution for different measurement problems with a common measurement error process. The problem is motivated by issues arising in the analysis of gene expression data but should have application in other similar settings. It is implicitly assumed throughout that there are large numbers of measurements but small numbers of repeated measurements. As a consequence, the distribution of the estimated residuals is a biased estimate of the residual distribution. The authors present two methods for the estimation of the residual distribution with some restriction on the form of the distribution. They give an upper bound for the rate of convergence for an estimator based on the characteristic function and compare its performance with that of another estimator with simulations.
The problem of estimating parameters within hidden Markov models is not straightforward. In particular, calculation of maximum likelihood estimates (MLE) is nontrivial. Some variations on MLE are described that are co...
详细信息
The problem of estimating parameters within hidden Markov models is not straightforward. In particular, calculation of maximum likelihood estimates (MLE) is nontrivial. Some variations on MLE are described that are computationally less burdensome, and detailed comparisons are drawn for the case of hidden binary isotropic Markov chains. (C) 2002 Elsevier Science B.V. All rights reserved.
作者:
Chen, HYUniv Illinois
Sch Publ Hlth Div Epidemiol & Biostat Chicago IL 60612 USA
The problem of nuisance covariate model specification is considered in Cox regression where the maximum semiparametric likelihood method is used to handle the missing covariates. A component of the covariates is model...
详细信息
The problem of nuisance covariate model specification is considered in Cox regression where the maximum semiparametric likelihood method is used to handle the missing covariates. A component of the covariates is modeled nonparametric ally to achieve robustness against covariate model misspecification and to reduce the number of possibly intractable integrations involved in the parametric modeling of the covariates. The statistical properties of the proposed method are examined. It is found that in some important situations, the maximum semiparametric likelihood can be applied without making any additional parametric model assumptions on covariates. The proposed method can yield a more efficient estimator than the nonparametric imputation methods and does not require specification of the missingness mechanism when compared with the inverse probability weighting method. A real data example is analyzed to demonstrate use of the proposed method.
This article analyzes changes in treatment practices in outpatient methadone treatment units from a national panel study. The analysis of this dataset is challenging due to several difficulties, including multiple lon...
详细信息
This article analyzes changes in treatment practices in outpatient methadone treatment units from a national panel study. The analysis of this dataset is challenging due to several difficulties, including multiple longitudinal outcomes, nonignorable nonresponses, and missing covariates. Specifically, the data included several variables that measure the effectiveness of methadone treatment practices for each unit. A substantial percentage of units (33%) did not respond during the follow-up. These dropout units tended to be units with less effective treatment practices: the dropout mechanism thus may be nonignorable. Finally, the time-varying covariates for the units that dropped out were missing at the time of dropout. A valid analysis hence needs to address these three issues simultaneously. Our approach assumes that the observed outcomes measure a latent variable (e.g., treatment practice effectiveness) with error. We model the relationship between this latent variable and covariates using a linear mixed model. To account for nonignorable dropouts, we apply a selection model in which die dropout probability depends on the latent variable. Finally, we accommodate missing time-varying covariates by modeling them using a transition model. In view of multidimensional integration in full-likelihood estimation, we develop the em algorithm to estimate the model parameters, We apply the proposed approach to the methadone treatment practices data. Our results show that methadone treatment practices have improved in the last decade. Our results are also useful for identifying the types of methadone treatment units that need improvement.
Advances in human genome mapping have led to the identification of large numbers of genetic markers that allow systematic searches for multiple disease susceptibility genes for complex traits. A common design involves...
详细信息
Advances in human genome mapping have led to the identification of large numbers of genetic markers that allow systematic searches for multiple disease susceptibility genes for complex traits. A common design involves the recruitment of families with at least two children affected with the disease of interest. The objective is to find chromosomal regions that harbour susceptibility genes for the disease. The affected children, their parents if available, and sometimes other, unaffected, siblings are genotyped using sets of microsatellite DNA markers representing chromosomal sites distributed across the genome. Each marker can occur in several different variants known as alleles, and a pair of alleles constitutes the marker genotype. Each child randomly inherits one of their mother's two alleles and one of their father's two alleles. If a marker is close to a disease susceptibility gene, then affected siblings are expected to have more sharing of the same maternal and/or paternal marker alleles. Statistical methods are used to estimate the distribution of allele sharing in each affected sib pair (ASP) using the set of markers typed across each chromosome, and to test for the presence of excess sharing in the families as a group at each point across the genome. Regression models that allow the allele sharing proportions to depend on characteristics of the family such as diagnostic subtype or ethnic background have been developed to address the heterogeneity that is characteristic of complex disease, but these have not yet been widely applied. In this paper, we apply regression modelling to investigate variation associated with family-level covariates and with the order in which families are recruited and genotyped. We also discuss how some of the concepts of group sequential analysis apply to accumulating data from genome scans of complex disease. Copyright (C) 2002 John Wiley Sons, Ltd.
We consider asymptotic approximations to joint posterior distributions in situations where the full conditional distributions referred to in Gibbs sampling are asymptotically normal. Our development focuses on problem...
详细信息
We consider asymptotic approximations to joint posterior distributions in situations where the full conditional distributions referred to in Gibbs sampling are asymptotically normal. Our development focuses on problems where data augmentation facilitates simpler calculations, but results hold more generally. Asymptotic mean vectors are obtained as simultaneous solutions to fixed point equations that arise naturally in the development. Asymptotic covariance matrices flow naturally from the work of Arnold & Press (1989) and involve the conditional asymptotic covariance matrices and first derivative matrices for conditional mean functions. When the fixed point equations admit an analytical solution, explicit formulae are subsequently obtained for the covariance structure of the joint limiting distribution, which may shed light on the use of the given statistical model. Two illustrations are given.
This paper presents a method for estimating the conditional or posterior distribution of the parameters of deterministic dynamical systems. The procedure conforms to an em implementation of a Gauss-Newton search for t...
详细信息
This paper presents a method for estimating the conditional or posterior distribution of the parameters of deterministic dynamical systems. The procedure conforms to an em implementation of a Gauss-Newton search for the maximum of the conditional or posterior density. The inclusion of priors in the estimation procedure ensures robust and rapid convergence and the resulting conditional densities enable Bayesian inference about the model parameters. The method is demonstrated using an input-state-output model of the hemodynamic coupling between experimentally designed causes or factors in fMRI studies and the ensuing BOLD response. This example represents a generalization of current fMRI analysis models that accommodates nonlinearities and in which the parameters have an explicit physical interpretation. Second, the approach extends classical inference, based on the likelihood of the data given a null hypothesis about the parameters, to more plausible inferences about the parameters of the model given the data. This inference provides for confidence intervals based on the conditional density. (C) 2002 Elsevier Science (USA).
A computational approach is presented for likelihood analysis of regression models with measurement errors in explanatory variables. If y, x, and w represent the response, an unobservable true value of an explanatory ...
详细信息
A computational approach is presented for likelihood analysis of regression models with measurement errors in explanatory variables. If y, x, and w represent the response, an unobservable true value of an explanatory variable, and an observable measurement of x, then the likelihood function is based on the density of the observable variables: f(y, w) = integralf(y, w\x)f(x)dx. For realistic model specifications the integral must be approximated numerically. While one could conceivably use a general-purpose optimization routine for finding estimates that maximize the approximate likelihood, that tends not to work very well. The approximate density, however, has the form of a finite mixture model so that the standard em algorithm for that problem can be applied, The resulting approach is practically important since it easily permits realistic distributional modeling and can be accomplished through iterative application of readily available routines.
The maximum likelihood estimator of the variance components in a linear model can be biased downwards. Restricted maximum likelihood (RemL) corrects this problem by using the likelihood of a set of residual contrasts ...
详细信息
The maximum likelihood estimator of the variance components in a linear model can be biased downwards. Restricted maximum likelihood (RemL) corrects this problem by using the likelihood of a set of residual contrasts and is generally considered superior. However, this original restricted maximum likelihood definition does not directly extend beyond linear models. We propose a RemL-type estimator for generalised linear mixed models by correcting the bias in the profile score function of the variance components. The proposed estimator has the same consistency properties as the maximum likelihood estimator if the number of parameters in the mean and variance components models remains fixed. However, the estimator of the variance components has a smaller finite sample bias. A simulation study with a logistic mixed model shows that the proposed estimator is effective in correcting the downward bias in the maximum likelihood estimator.
We consider the problem of multivariate outlier testing for purposes of distinguishing seismic signals of underground nuclear events from training samples based on non-nuclear seismic events when certain data are miss...
详细信息
We consider the problem of multivariate outlier testing for purposes of distinguishing seismic signals of underground nuclear events from training samples based on non-nuclear seismic events when certain data are missing. We consider the case in which the training data follow a multivariate normal distribution. Assume a potential outlier is observed on which k features of interest are measured. Assume further that the available training set of n observations on these k features is available but that some of the observations in the training data have missing features. The approach currently used in practice is to perform the outlier testing using a generalized likelihood ratio test procedure based only on the data vectors in the training data with complete data. When there is a substantial amount of missing data within the training set, use of this strategy may lead to a loss of valuable information. An alternative procedure is to incorporate all n of the data vectors in the training data using the em algorithm to appropriately handle the missing data in the training set. Resampling methods are used to find appropriate critical regions. We use simulation results and analysis of models fit to Pg/Lg ratios for the WMQ station in China to compare these two strategies for dealing with missing data.
暂无评论