The analysis of biological networks is an important task in life sciences. Most of biological interactions can be modeled using graphical networks where arcs represent probabilistic relationships between nodes or vari...
详细信息
The analysis of biological networks is an important task in life sciences. Most of biological interactions can be modeled using graphical networks where arcs represent probabilistic relationships between nodes or variables. Such models help scientists to analyze their complex data sets, test candidate interaction networks and understand the studied relationships. These studies face two major problems: the selection of most probable interaction topologies and the clustering of the associated peculiar data. In this paper, we model biological interactions with a mixture of multivariate Gaussian distributions. We, then, introduce a new algorithm for the parameters estimation and data clustering. This algorithm, called Graphical Expectation Maximization (Gem), extends the em algorithm by taking into account several decomposable graph structures and using an original initialization technique. Applying this algorithm, we propose a model selection procedure based on the Bayesian Information Criterion. The accuracy of the proposed method is demonstrated on the grounds of a simulation study of a signal transduction network of the epidermal growth factor (EGFR) protein. Moreover, we apply the proposed model selection procedure to choose the most appropriate interaction graphs for microbial community in infant gut using a real data set. (C) 2019 Elsevier B.V. All rights reserved.
This paper proposes a new approach for the joint processing of signal detection and channel estimation based on the expectation-maximization (em) algorithm in orthogonal frequency division multiplexing (OFDM) mobile c...
详细信息
This paper proposes a new approach for the joint processing of signal detection and channel estimation based on the expectation-maximization (em) algorithm in orthogonal frequency division multiplexing (OFDM) mobile communications. Conventional schemes based on the em algorithm estimate a channel impulse response using Kalman filter, and employ the random walk model or the first-order autoregressive (AR) model to derive the process equation for the filter. Since these models assume that the time-variation of the impulse response is white noise without considering any autocorrelation property, the accuracy of the channel estimation deteriorates under fast-fading conditions, resulting in an increased packet error rate (PER). To improve the accuracy of the estimation of fast-fading channels, the proposed scheme employs a differential model that allows the correlated time-variation to be considered by introducing the first-and higher-order time differentials of the channel impulse response. In addition, this paper derives a forward recursive form of the channel estimation along both the frequency and time axes in order to reduce the computational complexity. Computer simulations of channels under fast multipath fading conditions demonstrate that the proposed method is superior in PER to the conventional schemes that employ the random walk model.
An iterative method to recover perfectly focused images from a set of light microscopic images is proposed. The method is based on the em algorithm, and it assumes a prior knowledge about the Point Spread Function of ...
详细信息
An iterative method to recover perfectly focused images from a set of light microscopic images is proposed. The method is based on the em algorithm, and it assumes a prior knowledge about the Point Spread Function of the optical system, as well as about the optical parameter settings of the acquisition system. The method is applied to the visualization of integrated circuit samples through an optical microscope and to the recovery of their depth information.
Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a dynamic-programming-like backward recursion for the filter. This is combined with some ideas from reinforcement learni...
详细信息
Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a dynamic-programming-like backward recursion for the filter. This is combined with some ideas from reinforcement learning and a conditional version of importance sampling in order to develop a scheme based on stochastic approximation for estimating the desired conditional expectation. This is then extended to a smoothing problem. Applying these ideas to the em algorithm, a reinforcement learning scheme is developed for estimating the partially observed log-likelihood function. A stochastic approximation scheme maximizes this function over the unknown parameter. The two procedures are performed on two different time scales, emulating the alternating 'expectation' and 'maximization' operations of the em algorithm. We also extend this to a continuous state space problem. Numerical results are presented in support of our schemes.
Consider a system which is made up of multiple components connected in a series. In this case, the failure of the whole system is caused by the earliest failure of any of the components, which is commonly referred to ...
详细信息
Consider a system which is made up of multiple components connected in a series. In this case, the failure of the whole system is caused by the earliest failure of any of the components, which is commonly referred to as competing risks. In certain situations, it is observed that the determination of the cause of failure may be expensive, or may be very difficult to observe due to the lack of appropriate diagnostics. Therefore, it might be the case that the failure time is observed, but its corresponding cause of failure is not fully investigated. This is known as masking. Moreover, this competing risks problem is further complicated due to possible censoring. In practice, censoring is very common because of time and cost considerations on experiments. In this paper, we deal with parameter estimation of the incomplete lifetime data in competing risks using the em algorithm, where incompleteness arises due to censoring and masking. Several studies have been carried out, but parameter estimation for incomplete data has mainly focused on exponential models. We provide the general likelihood method, and the parameter estimation of a variety of models including exponential, s-normal, and lognormal models. This method can be easily implemented to find the MLE of other models. Exponential and lognormal examples are illustrated with parameter estimation, and a graphical technique for checking model validity.
The servo turret is a complex electromechanical hydraulic component that is the most likely to fail in a numerical control lathe. Reliability evaluation is used to make statistical inferences about the reliability cha...
详细信息
The servo turret is a complex electromechanical hydraulic component that is the most likely to fail in a numerical control lathe. Reliability evaluation is used to make statistical inferences about the reliability characteristics of products according to all the information related to product reliability. Failure data is the basis of reliability evaluation;however, it is very difficult to collect many accurate failure data for reliability evaluation. In this paper, the reliability of servo turret is evaluated based on failure data that contains accurate failure data and interval censored data. First, a mixture Weibull distribution is chosen for fitting the reliability model. Then, expectation-maximization algorithm is used for estimating the parameters of the distribution which contains hidden variable, and the confidence interval of parameters is constructed using the delta method. In the simulation, different percentages of accurate data and interval data are used and compared with data containing only accurate data. The accuracy of this method is evaluated by mean square error. Finally, the method is applied to the failure data of servo turret and the parameters of mixture Weibull distribution are determined. For possibly simplifying the mixed Weibull distribution, the hypothesis of shape or scale parameters being equal is tested. The hazard property and mean time between failure are then estimated and associated 95 % confidence intervals are obtained.
Gene classification problem is studied considering the ratio of gene expression levels, X, in two-channel microarrays and a non-observed categorical variable indicating how differentially expressed the gene is: non di...
详细信息
Gene classification problem is studied considering the ratio of gene expression levels, X, in two-channel microarrays and a non-observed categorical variable indicating how differentially expressed the gene is: non differentially expressed, down-regulated or up-regulated. Supposing X from a mixture of Gamma distributions, two methods are proposed and results are compared. The first method is based on an hierarchical Bayesian model. The conditional predictive probability of a gene to belong to each group is calculated and the gene is assigned to the group for which this conditional probability is higher. The second method uses em algorithm to estimate the most likely group label for each gene, that is, to assign the gene to the group which contains it with the higher estimated probability.
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (em) algorithm for Gaussian mixture models, has shown interestin...
详细信息
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (em) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or have non-Gaussian distributions. To overcome this issue, a new em algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the em framework with missing data thanks to its conditional distribution, which is shown to be a multivariate t-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.
In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of l...
详细信息
In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation-maximization algorithm to estimate the underlying presence-absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence-absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.
The expectation-maximization (em) algorithm was first introduced in the statistics literature as an iterative procedure that under some conditions produces maximum-likelihood (ML) parameter estimates, In this paper we...
详细信息
The expectation-maximization (em) algorithm was first introduced in the statistics literature as an iterative procedure that under some conditions produces maximum-likelihood (ML) parameter estimates, In this paper we investigate the application of the em algorithm to sequence estimation in the presence of random disturbances and additive white Gaussian noise, As examples of the use of the em algorithm, we look at the random-phase and fading channels, and show that a formulation of the sequence estimation problem based on the em algorithm can provide a means of obtaining ML sequence estimates, a task that has been previously too complex to perform.
暂无评论