Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuou...
详细信息
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
The purpose of this paper is to present and evaluate a heuristic algorithm for learning Bayesian networks for clustering. Our approach is based upon improving the Naive-Bayes model by means of constructive induction. ...
详细信息
The purpose of this paper is to present and evaluate a heuristic algorithm for learning Bayesian networks for clustering. Our approach is based upon improving the Naive-Bayes model by means of constructive induction. A key idea in this approach is to treat expected data as real data. This allows us to complete the database and to take advantage of factorable closed forms for the marginal likelihood. In order to get such an advantage, we search for parameter values using the em algorithm or another alternative approach that we have developed: a hybridization of the Bound and Collapse method and the em algorithm, which results in a method that exhibits a faster convergence rate and a more effective behaviour than the em algorithm. Also, we consider the possibility of interleaving runnings of these two methods after each structural change. We evaluate our approach on synthetic and real-world databases. (C) 1999 Elsevier Science B.V. All rights reserved.
Chain-of-events data are longitudinal observations on a succession of events that can only occur in a prescribed order. One goal in an analysis of this type of data is to determine the distribution of times between th...
详细信息
Chain-of-events data are longitudinal observations on a succession of events that can only occur in a prescribed order. One goal in an analysis of this type of data is to determine the distribution of times between the successive events. This is difficult when individuals are observed periodically rather than continuously because the event times are then interval censored. Chain-of-events data may also be subject to truncation when individuals can only be observed if a certain event in the chain (e.g., the final event) has occurred. We provide a nonparametric approach to estimate the distributions of times between successive events in discrete time for data such as these under the semi-Markov assumption that the times between events are independent. This method uses a self-consistency algorithm that extends Turnbull's algorithm (1976, Journal of the Royal Statistical Society, Series B 38, 290-295). The quantities required to carry out the algorithm can be calculated recursively for improved computational efficiency. Two examples using data from studies involving HIV disease are used to illustrate our methods.
In this paper a method to automatically generate a Gaussian mixture classifier is presented. The growing process is based on the iterative addition of Gaussian nodes. Each iteration takes place in two sequential steps...
详细信息
In this paper a method to automatically generate a Gaussian mixture classifier is presented. The growing process is based on the iterative addition of Gaussian nodes. Each iteration takes place in two sequential steps: first, using the em algorithm, we maximize the likelihood of the data under the current configuration of the classifier;then, a new Gaussian node is added to the class which most improves the discriminant capabilities of the network. Growth control is imposed by means of a complexity penalizing term and a discriminant MMI condition. The classical em algorithm for Gaussian mixtures is also extended to jointly include labeled and unlabeled data. We report some artificial experiments that show the utility of this extension and the reliability of the proposed growing technique. We also report results of the Growing Gaussian Mixtures Network on terrain classification over a Landsat-TM image using different restrictions on the covariance matrix of the Gaussian mixtures. Comparisons in classification performance with a set of MLP neural networks are provided. (C) 1999 Elsevier Science B.V. All rights reserved.
In this paper we examine the problem of estimating a stochastic signal from noise corrupted linearly distorted samples of the original. Due to the ill-posedness caused by the blurring function, we are motivated to exa...
详细信息
In this paper we examine the problem of estimating a stochastic signal from noise corrupted linearly distorted samples of the original. Due to the ill-posedness caused by the blurring function, we are motivated to examine an inversion method in which the statistics of the underlying process are modeled as a 1/f type fractal process. In particular, we explore two issues with the use of such a model: the effects of model mismatch and parameter estimation. Our analysis demonstrates that the mean-square-error performance of the estimator is quite insensitive to the choice of prior model parameters used in the recovery of the signal. Such robustness is shown to hold even when the underlying process is not of the 1/f variety. We then introduce an expectation-maximization technique for jointly extracting the best parameters for use in an inversion along with the reconstructed signal. Here, Monte Carlo and Cramer-Rao bound results demonstrate that we are able to determine accurate model parameters exactly in those situations where the model mismatch analysis shows that such fidelity is required to ensure low mean square error in the recovery of the underlying signal. (C) 1999 Elsevier Science B.V. All rights reserved.
Mixed effects models are often used for estimating fixed effects and variance components in longitudinal studies of continuous data. When the outcome being modelled is a laboratory measurement, however, it may be subj...
详细信息
Mixed effects models are often used for estimating fixed effects and variance components in longitudinal studies of continuous data. When the outcome being modelled is a laboratory measurement, however, it may be subject to lower and upper detection limits (i.e., censoring). In this paper, the usual em estimation procedure for mixed effects models is modified to account for left and/or right censoring.
In earlier work, my colleagues and I described a loglinear model for genetic data from triads composed of affected probands and their parents. This model allows detection of and discrimination between effects of an in...
详细信息
In earlier work, my colleagues and I described a loglinear model for genetic data from triads composed of affected probands and their parents. This model allows detection of and discrimination between effects of an inherited haplotype versus effects of the maternal,haplotype, which presumably would be mediated by prenatal factors. Like the transmission disequilibrium test (TDT), the likelihood-ratio test (LRT) based on this model is not sensitive to associations that are due to genetic admixture. When used as a method for testing for linkage disequilibrium, the LRT can-be regarded as an alternative:to the TDT. When one or both parents are missing, the resulting incomplete triad must be discarded to ensure validity of the TDT, thereby sacrificing information. By contrast, when the problem is set in a likelihood framework, the :expectation-maximization algorithm allows the incomplete triads to contribute their information to the LRT without invalidation of the analysis. Simulations demonstrate that much of the lost statistical power can be recaptured by means of this missing-data technique. In fact, power is reasonably good even when no triad is complete-for example;when a study is designed to include only mothers of cases. Information from siblings also can be incorporated to further improve the statistical power when genetic data from parents or probands are missing.
作者:
Marchette, DJPoston, WLUSN
Computat Stat Grp Ctr Surface Warfare Dahlgren VA 22448 USA USN
Adv Processors Grp Ctr Surface Warfare Dahlgren VA 22448 USA
In automatic pattern recognition applications, numerous features that describe the classes are obtained in an attempt to ensure accurate classification of unknown observations. These features or dimensions must be red...
详细信息
In automatic pattern recognition applications, numerous features that describe the classes are obtained in an attempt to ensure accurate classification of unknown observations. These features or dimensions must be reduced to a smaller number before classification schemes can be applied, because classifiers become computationally and analytically unmanageable in high dimensions;Principal components and Fisher's Linear Discriminant offer global dimensionality reduction within the framework of linear algebra applied to covariance matrices. This report describes local methods that use both mixture-models and nearest neighbor calculations to construct local versions of these methods. These new versions for local dimensionality reduction will provide increased classification accuracy in lower dimensions.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a j...
详细信息
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the em algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the em algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the em algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not 'testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have 'passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.
Purpose. To develop a pharmacokinetic model for tenidap and to identify important relationships between the pharmacokinetic parameters and available covariates. Methods. Plasma concentration data from several phase I ...
详细信息
Purpose. To develop a pharmacokinetic model for tenidap and to identify important relationships between the pharmacokinetic parameters and available covariates. Methods. Plasma concentration data from several phase I and phase II studies were used to develop a pharmacokinetic model for tenidap, a novel anti-rheumatic drug. An appropriate pharmacokinetic model was selected on the basis of individual nonlinear regression analyses and an em algorithm was used to perform a nonlinear mixed-effects analysis. Scatter plots of posterior individual pharmacokinetic parameters were used to identify possible covariate effects. Results. predicted responses were in good agreement with the observed data. A biexponential model with zero order absorption was subsequently used to develop the mixed-effects model. Covariate relationships selected on the basis of differences in the objective function, although statistically significant, were not particularly strong. Conclusions. The pharmacokinetics of tenidap can be described by a bi-exponential model with zero order absorption. Based on differences in the log-likelihood, significant covariate-parameter relationships were identified between smoking and CL, and between gender and Vss and CLd Simulated sparse data analyses indicated that the model would be robust for the analysis of sparse data, generated in observational studies.
暂无评论