Zero-inflated Poisson (ZIP) regression is a model for count data with excess zeros. It assumes that with probability p the only possible observation is 0, and with probability 1 - p, a Poisson(lambda) random variable ...
详细信息
Zero-inflated Poisson (ZIP) regression is a model for count data with excess zeros. It assumes that with probability p the only possible observation is 0, and with probability 1 - p, a Poisson(lambda) random variable is observed. For example, when manufacturing equipment is properly aligned, defects may be nearly impossible. But when it is misaligned, defects may occur according to a Poisson(lambda) distribution. Both the probability p of the perfect, zero defect state and the mean number of defects-lambda in the imperfect state-may depend on covariates. Sometimes p and lambda are unrelated;other times p is a simple function of lambda such as p = 1/(1 + lambda(tau)) for an unknown constant-tau. In either case, ZIP regression models are easy to fit. The maximum likelihood estimates (MLE's) are approximately normal in large samples, and confidence intervals can be constructed by inverting likelihood ratio tests or using the approximate normality of the MLE's. Simulations suggest that the confidence intervals based on likelihood ratio tests are better, however. Finally, ZIP regression models are not only easy to interpret, but they can also lead to more refined data analyses. For example, in an experiment concerning soldering defects on printed wiring boards, two sets of conditions gave about the same mean number of defects, but the perfect state was more likely under one set of conditions and the mean number of defects in the imperfect state was smaller under the other set of conditions;that is, ZIP regression can show not only which conditions give lower mean number of defects but also why the means are lower.
A new class of fast maximum likelihood estimation (MLE) algorithms for emission computed tomography (ECT) is developed. In these cyclic iterative algorithms, vector extrapolation techniques are integrated with the ite...
详细信息
A new class of fast maximum likelihood estimation (MLE) algorithms for emission computed tomography (ECT) is developed. In these cyclic iterative algorithms, vector extrapolation techniques are integrated with the iterations in gradient-based MLE algorithms, with the objective of accelerating the convergence of the base iterations. This results in a substantial reduction in the effective number of base iterations required for obtaining an emission density estimate of specified quality. The mathematical theory behind the minimal polynomial and reduced rank vector extrapolation techniques, in the context of emission tomography, is presented. With the em and em search algorithms in the base iterations, these extrapolation techniques are implemented in a positron emission tomography system. Using computer experiments, with measurements taken from simulated phantoms, the new algorithms are evaluated. It is shown that, with minimal additional computations, the proposed approach results in substantial improvement in reconstruction, in terms of both qualitative visual performance and quantitative measures of likelihood and residual error, of the image.
The standard hidden Markov model (HMM) and the hidden filter model assume local or state-conditioned stationarity for the modeled signal. In this work we generalize these models and develop the 'trended HMM' t...
详细信息
The standard hidden Markov model (HMM) and the hidden filter model assume local or state-conditioned stationarity for the modeled signal. In this work we generalize these models and develop the 'trended HMM' to allow the local, as well as the global (via a Markov chain), non-stationarity to be represented in the model. The mathematical structure of the trended HMM can be described by a discrete-time Markov process with its states associated with distinct regression functions on time, or alternatively by a 'deterministic trend plus stationary residual' time series with its parameters governed by the evolution of a Markov chain. The em algorithm is applied to obtain closed-form re-estimation formulas for the model parameters. Compared with the types of HMMs developed in the past, the trended HMM is a more faithful and more structured representation of many classes of speech sounds whose production involves strong articulatory dynamics. As such, it is expected to be a more suitable model for use in speech processing applications.
This article is concerned with quantifying and representing group differences when there are more variables than observations. In particular, canonical variate analysis when the data consist of curves sampled at many ...
详细信息
This article is concerned with quantifying and representing group differences when there are more variables than observations. In particular, canonical variate analysis when the data consist of curves sampled at many grid points is considered. A new method is proposed that involves replacing the usually singular within-groups variation matrix by a fitted matrix that is positive-definite. To obtain the fitted matrix, a class of models, along with associated estimation and model-selection procedures, is presented. The results are applied to experimental data designed to assess the usefulness of data from a portable field spectrometer for discriminating between usable farmland and farmland affected by salinity.
The mixed-Weibull distribution provides a good model for the lives of electrical and mechanical components (or systems) when the failure of the components (or systems) is caused by more than one failure mode. Due to t...
详细信息
The mixed-Weibull distribution provides a good model for the lives of electrical and mechanical components (or systems) when the failure of the components (or systems) is caused by more than one failure mode. Due to the lack of an efficient parameter estimation method, the mixed-Weibull model has not been used as widely by reliability practitioners as the single-population Weibull distribution. This paper presents a new algorithm for estimating the parameters of mixed-Weibull distributions from censored data. The algorithm follows the principle of the MLE (maximum likelihood estimate) through the em (expectation and maximization) algorithm, and it is derived for both postmortem and non-postmortem time-to-failure data. The following conclusions are drawn: 1) The concept of the em algorithm is easy to understand and apply (only elementary statistics and calculus are required). 2) The log-likelihood function can not decrease after an em sequence;this important feature was observed in all of the numerical calculations. 3) The MLEs of the non-postmortem data were obtained successfully for mixed-Weibull distributions with up to 14 parameters in a 5-subpopulation, mixed-Weibull distribution. This has not been seen, numerically, in the literature even for 3-subpopulation Weibull mixtures. We believe that there are no further difficulties in obtaining the MLEs of mixed-Weibull distributions even with more than 5 subpopulations. 4) The algorithm for the MLEs of postmortem data is a special case of our algorithm for the MLEs of non-postmortem data. 5) Numerical examples indicate that some of the log-likelihood functions of the mixed-Weibull distributions have multiple local maxima;therefore, the algorithm should start at several initial guesses of the parameters set. The searching of the largest local maximum can stop when a good fit has been found. 6) The em algorithm is very efficient. On the average for 2-Weibull mixtures with a sample size of 200, the CPU time (on VAX 8650
Threshold models may be useful for analyzing binary and ordinal data. They provide a link between the binary or ordinal measurement scale and an underlying, linear scale on which treatments are assumed to act. In many...
详细信息
Threshold models may be useful for analyzing binary and ordinal data. They provide a link between the binary or ordinal measurement scale and an underlying, linear scale on which treatments are assumed to act. In many experiments some form of stratification is present. This paper is concerned with situations in which there are nested strata, as, for example, in the practically important split-plot design. A threshold model is defined in which two nested errors appear on the linear scale. It is shown that maximum likelihood estimates can be obtained by iterative weighted least-squares. Maximum likelihood estimation involves integration;integrals are approximated by means of Gaussian-Hermite quadrature formulae. Practical applications are used to illustrate the methods.
A method of sieves using splines is proposed for regularizing maximum-likelihood estimates of power spectra. This method has several important properties, including the flexibility to be used at multiple resolution le...
详细信息
A method of sieves using splines is proposed for regularizing maximum-likelihood estimates of power spectra. This method has several important properties, including the flexibility to be used at multiple resolution levels. The resolution level is defined in terms of the support of the polynomial B-splines used. Using a discrepancy measure derived from the Kullback-Leibler divergence of parameterized density functions, an expression for the optimal rate of growth of the sieve is derived. While the sieves may be defined on nonuniform grids, in the case of uniform grids the optimal sieve size corresponds to an optimal resolution. Iterative algorithms for obtaining the maximum-likelihood sieve estimates are derived. Applications to spectrum estimation and radar imaging are proposed.
A model is proposed for the analysis of censored data which combines a logistic formulation for the probability of occurrence of an event with a proportional hazards specification for the time of occurrence of the eve...
详细信息
A model is proposed for the analysis of censored data which combines a logistic formulation for the probability of occurrence of an event with a proportional hazards specification for the time of occurrence of the event. The proposed model is a semiparametric generalization of a parametric model due to Farewell (1982). Estimates of the regression parameters are obtained by maximizing a Monte Carlo approximation of a marginal likelihood and the em algorithm is used to estimate the baseline survivor function. We present some simulation results to verify the validity of the suggested estimation procedure. It appears that the semiparametric estimates are reasonably efficient with acceptable bias whereas the parametric estimates can be highly dependent on the parametric assumptions.
Monitoring clinical trials in nonfatal diseases where ethical considerations do not dictate early termination upon demonstration of efficacy often requires examining the interim findings to assure that the protocol-sp...
详细信息
Monitoring clinical trials in nonfatal diseases where ethical considerations do not dictate early termination upon demonstration of efficacy often requires examining the interim findings to assure that the protocol-specified sample size will provide sufficient power against the null hypothesis when the alternative hypothesis is true. The sample size may be increased, if necessary to assure adequate power. This paper presents a new method for carrying out such interim power evaluations for observations from normal distributions without unblinding the treatment assignments or discernably affecting the Type 1 error rate. Simulation studies confirm the expected performance of the method.
A new solution is proposed for a sparse data problem arising in nonparametric estimation of a bivariate survival function. Prior information, if available, can be used to obtain initial values for the em algorithm. In...
详细信息
A new solution is proposed for a sparse data problem arising in nonparametric estimation of a bivariate survival function. Prior information, if available, can be used to obtain initial values for the em algorithm. Initial values will completely determine estimates of portions of the distribution which are not identifiable from the data, while having a minimal effect on estimates of portions of the distribution for which the data provide sufficient information. Methods are applied to the distribution of women's age at first marriage and age at birth of first child, using data from the Current Population Surveys of 1975 and 1986.
暂无评论