Statistical methods for mapping quantitative trait loci relative to genetic markers are now well established for continuous traits with normal distributions. However, many traits of economic importance are recorded on...
详细信息
Statistical methods for mapping quantitative trait loci relative to genetic markers are now well established for continuous traits with normal distributions. However, many traits of economic importance are recorded on a discrete, ordinal scale. Here we describe a model developed for the analysis of ordinal traits, such as degree of difficulty in calving or categories of plant disease resistance. The model estimates the distance from the quantitative trait locus to neighbouring genetic markers, and also genetic parameters, either as gene effects on an underlying continuous scale or as probabilities of the observed categories. The model is tested on simulated data and is compared with an analysis based on mixtures of normal distributions. The ordinal model is found to estimate the parameters more accurately, especially when the number of categories is small or when only one linked marker is available.
A mixture model is an attractive approach for analyzing failure time data in which there are thought to be two groups of subjects, those who could eventually develop the endpoint and those who could not develop the en...
详细信息
A mixture model is an attractive approach for analyzing failure time data in which there are thought to be two groups of subjects, those who could eventually develop the endpoint and those who could not develop the endpoint. The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982). A logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part of the model. The estimator arises naturally out of the em algorithm approach for fitting failure time mixture models as described by Larson and Dinse (1985). The procedure is applied to some experimental data from radiation biology and is evaluated in a Monte Carlo simulation study. The simulation study suggests the semi-parametric procedure is almost as efficient as the correct fully parametric procedure for estimating the regression coefficient in the incidence, but less efficient for estimating the latency distribution.
Our purpose is to estimate the joint distribution of probability of a pair of random variables when only one of these variables is observed. In other words, there are observed data and missing data. Our estimation met...
详细信息
Our purpose is to estimate the joint distribution of probability of a pair of random variables when only one of these variables is observed. In other words, there are observed data and missing data. Our estimation method is an iterative procedure which can be seen as a stochastic version of the em algorithm. At each iteration, we simulate the non-observed variable with the posterior distribution and estimate the joint distribution. We deal with the case of Markov random fields indexed by Z(p) and study a convolution model. With this example, we show that the method can address a wide class of models, widely used in signal or image processing. In fact, we estimate a convolution filter and a noise variance as well as the parameters of a mixture of populations and a Gibbs distribution. Finally, we show that a non-parametric estimation of the probability density of the non-observed variables can be performed. Simulations and applications to real data give very satisfactory results.
Sperm and ova are sensitive to numerous toxicants in animal studies;however, human vulnerability has been far more difficult to assess, due in part to a lack of methods for measuring the viable survival of human gamet...
详细信息
Sperm and ova are sensitive to numerous toxicants in animal studies;however, human vulnerability has been far more difficult to assess, due in part to a lack of methods for measuring the viable survival of human gametes in vivo. We present a parametric model for fertility, which assumes that the viable lifetime of the ovum is fixed while that of sperm is exponentially distributed. By reducing the number of parameters that must be estimated, compared to a previous approach, the model leads to improved tests for differences in sperm and egg survival between exposed and unexposed couples. Since it assumes that batches of sperm introduced on different days present independent competing ''risks'' (of fertilization) to the ovum, the model also provides for estimation of the age distribution, in days, of the sperm which actually fertilized the ova. This allows us to consider whether older sperm are more likely to produce defective embryos. We apply this model to data from a group of women who were intensively studied, beginning when they discontinued contraception in order to start a pregnancy. Participants kept daily records of intercourse. Daily urine specimens allowed the day of ovulation to be estimated and conceptions to be identified, based on assays of excreted hormones. Applying the parametric model to these data, the estimated mean viable lifetime for sperm is 1.4 days, while the lifetime of the ovum appears to be less than a day. The age distributions for the fertilizing sperm are remarkably similar for pregnancies ending in very early loss and pregnancies surviving long enough to be clinically recognized, suggesting that age of the fertilizing sperm is irrelevant to viability of the conceptus.
A new method of feature selection based on the approximation of class conditional densities by a mixture of parameterized densities of a special type, suitable especially for multimodal data, is presented. No search p...
详细信息
A new method of feature selection based on the approximation of class conditional densities by a mixture of parameterized densities of a special type, suitable especially for multimodal data, is presented. No search procedure is needed when using the proposed method. Its performance is tested both on real and simulated data.
The paper is devoted to the problem of statistical estimation of a multivariate distribution density, which is a discrete mixture of Gaussian distributions. A heuristic approach is considered, based on the use of the ...
详细信息
The paper is devoted to the problem of statistical estimation of a multivariate distribution density, which is a discrete mixture of Gaussian distributions. A heuristic approach is considered, based on the use of the em algorithm and nonparametric density estimation with a sequential increase in the number of components of the mixture. Criteria for testing of model adequacy are discussed.
Restricted parameter spaces for covariance matrices, such as Sigma = sigma(2)I or Sigma = alpha I + beta J, are often used to simplify estimation. In addition, fixed upper and/or lower bounds may be needed to ensure t...
详细信息
Restricted parameter spaces for covariance matrices, such as Sigma = sigma(2)I or Sigma = alpha I + beta J, are often used to simplify estimation. In addition, fixed upper and/or lower bounds may be needed to ensure that estimates satisfy a priori hypotheses. With multivariate variance components models, several covariance matrices need to be simultaneously estimated and, even with a reduced parameter space, estimation can be difficult. In earlier work we have discussed estimation for a widely-used class of models where the variance components matrices need only be nonnegative definite. In this article we extend these results to handle a wide class of restricted parameter spaces. We state the conditions required for a parameterization to be a member of the class, discuss the implementation of the results for several different members of the class, and discuss estimation with both balanced and unbalanced data. We give several examples to demonstrate the results.
Specifying a record-linkage procedure requires both (1) a method for measuring closeness of agreement between records, typically a scalar weight, and (2) a rule for deciding when to classify records as matches or nonm...
详细信息
Specifying a record-linkage procedure requires both (1) a method for measuring closeness of agreement between records, typically a scalar weight, and (2) a rule for deciding when to classify records as matches or nonmatches based on the weights. Here we outline a general strategy for the second problem, that is, for accurately estimating false-match rates for each possible cutoff weight. The strategy uses a model where the distribution of observed weights are viewed as a mixture of weights for true matches and weights for false matches. An em algorithm for fitting mixtures of transformed-normal distributions is used to find posterior modes;associated posterior variability is due to uncertainty about specific normalizing transformations as well as uncertainty in the parameters of the mixture model the latter being calculated using the Sem algorithm. This mixture-model calibration method is shown to perform well in an applied setting with census data. Further, a simulation experiment reveals that, across a wide variety of settings not satisfying the model's assumptions, the procedure is slightly conservative on average in the sense of overstating false-match rates, and the one-sided confidence coverage (i.e., the proportion of times that these interval estimates cover or overstate the actual false-match rate) is very close to the nominal rate.
We consider regression analysis when incomplete or auxiliary covariate data are available for all study subjects and, in addition, for a subset called the validation sample, true covariate data of interest have been a...
详细信息
We consider regression analysis when incomplete or auxiliary covariate data are available for all study subjects and, in addition, for a subset called the validation sample, true covariate data of interest have been ascertained. The term auxiliary data refers to data not in the regression model, but thought to be informative about the true missing covariate data of interest. We discuss a method which is nonparametric with respect to the association between available and missing data, allows missingness to depend on available response and covariate values, and is applicable to both cohort and case-control study designs. The method previously proposed by Flanders & Greenland (1991) and by Zhao & Lipsitz (1992) is generalised and asymptotic theory is derived. Our expression for the asymptotic variance of the estimator provides intuition regarding performance of the method. Optimal sampling strategies for the validation set are also suggested by the asymptotic results.
Maximum likelihood and least-squares-type estimation of the linear failure rate (LFR) distribution are studied for type II censored samples. It is shown that a particular structural property of the LFR greatly facilit...
详细信息
Maximum likelihood and least-squares-type estimation of the linear failure rate (LFR) distribution are studied for type II censored samples. It is shown that a particular structural property of the LFR greatly facilitates application of the em algorithm for computing the MLE's. Also, an apparently ad hoc method, which rests on maximization of a pseudo-likelihood, is shown to produce the MLE's. A finite-sample exact confidence procedure is developed as an alternative to the existing MLE-based large-sample procedure that suffers from poor coverage probabilities even in moderately large samples. Also, asymptotic efficiencies of the least-squares-type estimates relative to the MLE's are derived. Numerical computations show that these can be quite low in many cases.
暂无评论