In the whole process of data mining, the emalgorithm is widely applied to dealing with incomplete data for its numerical stability, simplicity of implementation, reliable global convergence. the main disadvantage of ...
详细信息
ISBN:
(纸本)9781538669563
In the whole process of data mining, the emalgorithm is widely applied to dealing with incomplete data for its numerical stability, simplicity of implementation, reliable global convergence. the main disadvantage of the em is slow convergence speed, the algorithm is highly dependent on the initial value of the option, In this paper, the clustering results use K-means algorithm as the initial scope of emalgorithm, according to the different choice of different characteristics of mining purposes, then use incremental em algorithm (Iem) step by step em iterative refinement repeatedly, it obtains the optimal value of filling missing data quickly and efficiently. it is concluded that the optimal value of filling missing data experimental results show that the algorithm of this paper to speed up the convergence rate, strengthened the stability of clustering, data filling effect is remarkable.
The emalgorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the emalgorithm requires consi...
详细信息
The emalgorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the emalgorithm requires considerable computation time in its application to large data sets. Two versions, the incrementalem (Iem) algorithm and a sparse version of the emalgorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355- 368 to reduce the computational cost of applying the emalgorithm. With the Iemalgorithm, the available n observations are divided into B (B less than or equal to n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the emalgorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated;the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the Iemalgorithm with various number of blocks and the standard emalgorithm. In particular, we propose a simple rule for choosing the number of blocks with the Iemalgorithm. For the Iemalgorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the Iemalgorithm (SPIem) is formulated by combining the sparse E-step of the emalgorithm and the partial E-step of the Iemalgorithm. This SPIemalgorithm can further reduce the computation time of the Iemalgorithm.
暂无评论