In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of l...
详细信息
In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation-maximization algorithm to estimate the underlying presence-absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence-absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (em) algorithm for Gaussian mixture models, has shown interestin...
详细信息
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (em) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or have non-Gaussian distributions. To overcome this issue, a new em algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the em framework with missing data thanks to its conditional distribution, which is shown to be a multivariate t-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.
We describe a general method for analyzing aggregated Bernoulli outcomes. The research is motivated by an epidemiological reproductive study where the outcome was whether pregnancy was detected in a woman's partic...
详细信息
We describe a general method for analyzing aggregated Bernoulli outcomes. The research is motivated by an epidemiological reproductive study where the outcome was whether pregnancy was detected in a woman's particular menstrual cycle and the Bernoulli ''trials'' corresponded to days with intercourse during the cycle. Each cycle is either ''viable'' or not, i.e., is or is not susceptible to conception. We develop an em algorithm approach to maximizing the observed-data pseudo-likelihood, based on a set of unobservable latent outcomes linked to the specific days with intercourse. This method is flexible in that it allows one to model effects of covariates on the susceptibility factor and on the latent outcomes. Application of the method to fertility studies enables one to investigate covariates with a long-term or transient effect on the daily conception probability. A complication is that most couples contribute more than one cycle in a prospective study. A generalized estimating equation approach adjusts for the dependency among outcomes within individual couples. The method can be applied in any setting where dependency among Bernoulli trials is induced by a susceptibility factor and the trial outcomes are only observable in the aggregate.
The expectation-maximization (em) algorithm was first introduced in the statistics literature as an iterative procedure that under some conditions produces maximum-likelihood (ML) parameter estimates, In this paper we...
详细信息
The expectation-maximization (em) algorithm was first introduced in the statistics literature as an iterative procedure that under some conditions produces maximum-likelihood (ML) parameter estimates, In this paper we investigate the application of the em algorithm to sequence estimation in the presence of random disturbances and additive white Gaussian noise, As examples of the use of the em algorithm, we look at the random-phase and fading channels, and show that a formulation of the sequence estimation problem based on the em algorithm can provide a means of obtaining ML sequence estimates, a task that has been previously too complex to perform.
The semiparametric proportional odds (PO) model is a popular alternative to Cox's proportional hazards model for analyzing survival data. Although many approaches have been proposed for this topic in the literatur...
详细信息
The semiparametric proportional odds (PO) model is a popular alternative to Cox's proportional hazards model for analyzing survival data. Although many approaches have been proposed for this topic in the literature, most of the existing approaches have been found computationally expensive and difficult to implement. In this article, a novel and easy-to-implement approach based on an expectation-maximization (em) algorithm is proposed for analyzing right-censored data. The em algorithm involves only solving a low-dimensional estimating equation for the regression parameters and then updating the spline coefficients in simple closed form at each iteration. Our method is robust to initial values, converges fast, and provides the variance estimates in closed form. Simulation studies suggest that the proposed method has excellent performance in estimating both regression parameters and the baseline survival function, even when the right censoring rate is very high. The method is applied to a large dataset about breast cancer survival extracted from the Surveillance, Epidemiology, and End Results (SEER) database maintained by the U.S. National Cancer Institute. This method is now available in R package regPOr for public use.
In this article, we revisit the problem of fitting a mixture model under the assumption that the mixture components are symmetric and log-concave. To this end, we first study the nonparametric maximum likelihood estim...
详细信息
In this article, we revisit the problem of fitting a mixture model under the assumption that the mixture components are symmetric and log-concave. To this end, we first study the nonparametric maximum likelihood estimation (MLE) of a monotone log-concave probability density. To fit the mixture model, we propose a semiparametric em (Sem) algorithm, which can be adapted to other semiparametric mixture models. In our numerical experiments, we compare our algorithm to that of Balabdaoui and Doss (2018, Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):1053-71) and other mixture models both on simulated and real-world datasets.
The rapid advance in molecular biology has made feasible systematic studies of mapping quantitative trait loci (QTL) in experiment organisms. The method of multiple interval mapping provides an appropriate way for map...
详细信息
The rapid advance in molecular biology has made feasible systematic studies of mapping quantitative trait loci (QTL) in experiment organisms. The method of multiple interval mapping provides an appropriate way for mapping QTL using genetic makers. However, crossover interference has not been considered sufficiently in the current QTL mapping in which no crossover interference is assumed, and the length of maker interval is always kept fixed. In this article, we consider the issue of statistical inference in multiple interval mapping for QTL when crossover interference is present. The marker interval can be chosen appropriately in our method without keeping the maker interval lengths fixed in advance, and the asymptotic variance-covariance matrix of the MLEs is also derived. Two simulations are performed to evaluate the proposed method and show the impact of crossover interference to QTL mapping.
In this paper, we present a novel competitive em (Cem) algorithm for finite mixture models to overcome the two main drawbacks of the em algorithm: often getting trapped at local maxima and sometimes converging to the ...
详细信息
In this paper, we present a novel competitive em (Cem) algorithm for finite mixture models to overcome the two main drawbacks of the em algorithm: often getting trapped at local maxima and sometimes converging to the boundary of the parameter space. The proposed algorithm is capable of automatically choosing the clustering number and selecting the "split" or "merge" operations efficiently based on the new competitive mechanism we propose. It is insensitive to the initial configuration of the mixture component number and model parameters. Experiments on synthetic data show that our algorithm has very promising performance for the parameter estimation of mixture models. The algorithm is also applied to the structure analysis of complicated Chinese characters. The results show that the proposed algorithm performs much better than previous methods with slightly heavier computation burden. (C) 2003 Published by Elsevier Ltd on behalf of Pattern Recognition Society.
In the standard minimum-variance filter recursions it is routinely assumed that the noises are zero-mean and white. In image restoration applications, the data can be contaminated with (non-zero-mean) Poisson noise. T...
详细信息
ISBN:
(纸本)9781479949755
In the standard minimum-variance filter recursions it is routinely assumed that the noises are zero-mean and white. In image restoration applications, the data can be contaminated with (non-zero-mean) Poisson noise. This paper introduces the minimum-variance filter for the case where the measurement noise includes a Poisson-distributed component. An em algorithm for estimating the Poisson noise intensity is described. Conditions for the convergence of the algorithms are also investigated. An image restoration example is presented which demonstrates the efficacy of the described method.
Multitype branching processes (MTBP) model branching structures, where the nodes of the resulting tree are objects of different types. One field of application of such models in biology is in studies of cell prolifera...
详细信息
ISBN:
(纸本)9783319086729;9783319086712
Multitype branching processes (MTBP) model branching structures, where the nodes of the resulting tree are objects of different types. One field of application of such models in biology is in studies of cell proliferation. A sampling scheme that appears frequently is observing the cell count in several independent colonies at discrete time points (sometimes only one). Thus, the process is not observable in the sense of the whole tree, but only as the "generation" at given moment in time, which consist of the number of cells of every type. This requires an em-type algorithm to obtain a maximum likelihood (ML) estimation of the parameters of the branching process. A computational approach for obtaining such estimation of the offspring distribution is presented in the class of Markov branching processes with terminal types.
暂无评论