Context: Although independent imputation techniques are comprehensively studied in software effort prediction, there are few studies on embedded methods in dealing with missing data in software effort prediction. Obje...
详细信息
Context: Although independent imputation techniques are comprehensively studied in software effort prediction, there are few studies on embedded methods in dealing with missing data in software effort prediction. Objective: We propose BRem (Bayesian Regression and Expectation Maximization) algorithm for software effort prediction and two embedded strategies to handle missing data. Method: The MDT (Missing Data Toleration) strategy ignores the missing data when using BRem for software effort prediction and the MDI (Missing Data Imputation) strategy uses observed data to impute missing data in an iterative manner while elaborating the predictive model. Results: Experiments on the ISBSG and CSBSG datasets demonstrate that when there are no missing values in historical dataset, BRem outperforms LR (Linear Regression), BR (Bayesian Regression), SVR (Support Vector Regression) and M5' regression tree in software effort prediction on the condition that the test set is not greater than 30% of the whole historical dataset for ISBSG dataset and 25% of the whole historical dataset for CSBSG dataset. When there are missing values in historical datasets, BRem with the MDT and MDI strategies significantly outperforms those independent imputation techniques, including MI, BMI, CMI, MINI and M5'. Moreover, the MDI strategy provides BRem with more accurate imputation for the missing values than those given by the independent missing imputation techniques on the condition that the level of missing data in training set is not larger than 10% for both ISBSG and CSBSG datasets. Conclusion: The experimental results suggest that BRem is promising in software effort prediction. When there are missing values, the MDI strategy is preferred to be embedded with BRem. (C) 2014 Elsevier B.V. All rights reserved.
We consider a class of degradation processes that can consist of distinct phases of behavior. In particular, the degradation rates could possibly increase or decrease in a non-smooth manner at some point in time when ...
详细信息
We consider a class of degradation processes that can consist of distinct phases of behavior. In particular, the degradation rates could possibly increase or decrease in a non-smooth manner at some point in time when the underlying degradation process changes phase. To model the degradation path of a given device, we use an independent-increments stochastic process with a single unobserved change-point. Furthermore, we assume that the change-point varies randomly from device-to-device. The likelihood functions for such a model are analytically intractable, so in this paper we develop an em algorithm for this model to obtain the maximum likelihood estimators efficiently. We demonstrate the applicability of the method using two different models, and present some computational results of our implementation.
Standard survival models assume independence between survival times and frailty models provide a useful extension of the standard survival models by introducing a random effect (frailty) when the survival data are cor...
详细信息
Standard survival models assume independence between survival times and frailty models provide a useful extension of the standard survival models by introducing a random effect (frailty) when the survival data are correlated. Several estimation methods have been proposed to find the parameters of shared frailty models. Among them, the em algorithm (Survival Analysis-Techniques for Censored and Truncated Data, 1997) and the penalized likelihood method (Penalized Survival Models and Frailty, Technical Report No. 66, Mayo Foundation, 2000) are two popular ones. However, the variance estimates involve the calculation of matrix inverse, so the current methods are not able to handle the data with a large number of clusters. This paper provides a modified em algorithm for the shared frailty models. The new method utilizes standard statistical procedures to find the maximum likelihood estimates (MLE) and it can handle data sets with large numbers of clusters and distinct event times. The confidence intervals of the parameters can be constructed by multiple imputation. Simulation studies were carried out to compare different approaches for the frailty models. (c) 2004 Elsevier B.V. All rights reserved.
Among recent methods designed for accelerating the em algorithm without any modification in the structure of em or in the statistical model, the parabolic acceleration (P-em) has proved its efficiency. It does not inv...
详细信息
Among recent methods designed for accelerating the em algorithm without any modification in the structure of em or in the statistical model, the parabolic acceleration (P-em) has proved its efficiency. It does not involve any computation of gradient or hessian matrix and can be used as an additional software component of any fixed point algorithm maximizing some objective function. The vector epsilon algorithm was introduced to reach the same goals. Through geometric considerations, the relationships between the outputs of an improved version of P-em and those of the vector epsilon algorithm are established. This sheds some light on their different behaviours and explains why the parabolic acceleration of em outperforms its competitor in most numerical experiments. A detailed analysis of its trajectories in a variety of real or simulated data shows the ability of P-em to choose the most efficient paths to the global maximum of the likelihood. (C) 2012 Elsevier B.V. All rights reserved.
Singularities in the parameter spaces of hierarchical learning machines are known to be a main cause of slow convergence of gradient descent learning. The em algorithm, which is another learning algorithm giving a max...
详细信息
Singularities in the parameter spaces of hierarchical learning machines are known to be a main cause of slow convergence of gradient descent learning. The em algorithm, which is another learning algorithm giving a maximum likelihood estimator, is also suffering from its slow convergence, which often appears when the component overlap is large. We analyze the dynamics of the em algorithm for Gaussian mixtures around singularities and show that there exists a slow manifold caused by a singular structure, which is closely related to the slow convergence of the em algorithm. We also conduct numerical simulations to confirm the theoretical analysis. Through the simulations, we compare the dynamics of the em algorithm with that of the gradient descent algorithm, and show that their slow dynamics are caused by the same singular structure, and thus they have the same behaviors around singularities.
A novel and efficient mixture model fitting technique, called penalized minimum matching distance-guided expectation-maximization (em) algorithm, is proposed. Penalized minimum matching distance is used to find the nu...
详细信息
A novel and efficient mixture model fitting technique, called penalized minimum matching distance-guided expectation-maximization (em) algorithm, is proposed. Penalized minimum matching distance is used to find the number of mixture components very accurately. We illustrate the excellent performance of the penalized minimum matching distance-guided em algorithm with experiments involving Gaussian mixtures. (c) 2005 Elsevier GmbH. All fights reserved.
The application of the Bayesian Structural em algorithm to learn Bayesian networks (BNs) for clustering implies a search over the space of BN structures alternating between two steps: an optimization of the BN paramet...
详细信息
The application of the Bayesian Structural em algorithm to learn Bayesian networks (BNs) for clustering implies a search over the space of BN structures alternating between two steps: an optimization of the BN parameters (usually by means of the em algorithm) and a structural search for model selection. In this paper, we propose to perform the optimization of the BN parameters using an alternative approach to the em algorithm: the BC + em method. We provide experimental results to show that our proposal results in a more effective and efficient version of the Bayesian Structural em algorithm for learning BNs for clustering. (C) 2000 Elsevier Science B.V. All rights reserved.
Distributed estimation of Gaussian mixtures has many applications in wireless sensor network (WSN), and its energy-efficient solution is still challenging. This paper presents a novel diffusion-based em algorithm for ...
详细信息
Distributed estimation of Gaussian mixtures has many applications in wireless sensor network (WSN), and its energy-efficient solution is still challenging. This paper presents a novel diffusion-based em algorithm for this problem. A diffusion strategy is introduced for acquiring the global statistics in em algorithm in which each sensor node only needs to communicate its local statistics to its neighboring nodes at each iteration. This improves the existing consensus-based distributed em algorithm which may need much more communication overhead for consensus, especially in large scale networks. The robustness and scalability of the proposed approach can be achieved by distributed processing in the networks. In addition, we show that the proposed approach can be considered as a stochastic approximation method to find the maximum likelihood estimation for Gaussian mixtures. Simulation results show the efficiency of this approach.
Facial skin detection is an important step in facial surgical planning like as many other applications. There are many problems in facial skin detection. One of them is that the image features can be severely corrupte...
详细信息
Facial skin detection is an important step in facial surgical planning like as many other applications. There are many problems in facial skin detection. One of them is that the image features can be severely corrupted due to illumination, noise, and occlusion, where, shadows can cause numerous strong edges. Hence, in this paper, we present an automatic Expectation-Maximization (em) algorithm for facial skin color segmentation that uses knowledge of chromatic space and varying illumination conditions to correct and segment frontal and lateral facial color images, simultaneously. The proposed em algorithm leads to a method that allows for more robust and accurate segmentation of facial images. The initialization of the model parameters is very important in convergence of algorithm. For this purpose, we use a method for robust parameter estimation of Gaussian mixture components. Also, we use an additional class, which includes all pixels not modeled explicitly by Gaussian with small variance, by a uniform probability density, and amending the em algorithm appropriately, in order to obtain significantly better results. Experimental results on facial color images show that accurate estimates of the Gaussian mixture parameters are computed. Also, other results on images presenting a wide range of variations in lighting conditions, demonstrate the efficiency of the proposed color skin segmentation algorithm compared to conventional em algorithm.
New acceleration schemes and restarting procedures are defined and studied in view of application to the em algorithm. In most cases the introduced algorithms circumvent the problems of stagnation and degeneracy. Thei...
详细信息
New acceleration schemes and restarting procedures are defined and studied in view of application to the em algorithm. In most cases the introduced algorithms circumvent the problems of stagnation and degeneracy. Their behavior is analyzed on real data sets. (c) 2007 Elsevier B.V. All rights reserved.
暂无评论