Clustering high dimensional data has become a challenge in data mining due to the curse of dimension-ality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that se...
详细信息
Clustering high dimensional data has become a challenge in data mining due to the curse of dimension-ality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calcu-lates the local feature weights automatically in an em-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weight-ing method, as a means to minimize a proposed cluster-ing criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the pos-sible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.
Learning a Gaussian mixture with a local algorithm like em can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization,...
详细信息
Learning a Gaussian mixture with a local algorithm like em can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get trapped in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number k, the algorithm is capable of achieving solutions superior to em with k components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture.
Standard survival models assume independence between survival times and frailty models provide a useful extension of the standard survival models by introducing a random effect (frailty) when the survival data are cor...
详细信息
Standard survival models assume independence between survival times and frailty models provide a useful extension of the standard survival models by introducing a random effect (frailty) when the survival data are correlated. Several estimation methods have been proposed to find the parameters of shared frailty models. Among them, the em algorithm (Survival Analysis-Techniques for Censored and Truncated Data, 1997) and the penalized likelihood method (Penalized Survival Models and Frailty, Technical Report No. 66, Mayo Foundation, 2000) are two popular ones. However, the variance estimates involve the calculation of matrix inverse, so the current methods are not able to handle the data with a large number of clusters. This paper provides a modified em algorithm for the shared frailty models. The new method utilizes standard statistical procedures to find the maximum likelihood estimates (MLE) and it can handle data sets with large numbers of clusters and distinct event times. The confidence intervals of the parameters can be constructed by multiple imputation. Simulation studies were carried out to compare different approaches for the frailty models. (c) 2004 Elsevier B.V. All rights reserved.
This paper concerns the application of the em algorithm for the estimation of the parameters of non-stationary noisy phase mono component signals buried in additive noise. Noisy phase signals are appropriate for model...
详细信息
ISBN:
(纸本)0780374029
This paper concerns the application of the em algorithm for the estimation of the parameters of non-stationary noisy phase mono component signals buried in additive noise. Noisy phase signals are appropriate for modeling real world signals such as radar and communications signals. The maximum likelihood estimator for the signal parameters is explicited. The problem is then formulated as a missing data problem which enables a natural use of the Expectation-Maximization algorithm. A robust initialization scheme based on non-parametric time-frequency distributions is presented. Experimental results with both real and simulated data show the efficiency of the procedure.
作者:
Tsai, AWells, WMWarfield, SKWillsky, ASHarvard Univ
Sch Med Brigham & Womens Hosp Dept Radiol Boston MA 02115 USA MIT
Informat & Decis Syst Lab Cambridge MA 02139 USA MIT
Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA Harvard Univ
Sch Med Boston Childrens Hosp Dept Radiol Boston MA 02115 USA
In this paper, we propose an expectation-maximization (em) approach to separate a shape database into different shape classes, while simultaneously estimating the shape contours that best exemplify each of the differe...
详细信息
In this paper, we propose an expectation-maximization (em) approach to separate a shape database into different shape classes, while simultaneously estimating the shape contours that best exemplify each of the different shape classes. We begin our formulation by employing the level set function as the shape descriptor. Next, for each shape class we assume that there exists an unknown underlying level set function whose zero level set describes the contour that best represents the shapes within that shape class. The level set function for each example shape in the database is modeled as a noisy measurement of the appropriate shape class's unknown underlying level set function. Based on this measurement model and the judicious introduction of the class labels as the hidden data, our em formulation calculates the labels for shape classification and estimates the shape contours that best typify the different shape classes. This resulting iterative algorithm is computationally efficient, simple, and accurate. We demonstrate the utility and performance of this algorithm by applying it to two medical applications. (c) 2005 Elsevier B.V. All rights reserved.
We consider the problem of estimation of the parameters of the Marshall-Olkin Bivariate Weibull distribution in the presence of random censoring. Since the maximum likelihood estimators of the parameters cannot be exp...
详细信息
We consider the problem of estimation of the parameters of the Marshall-Olkin Bivariate Weibull distribution in the presence of random censoring. Since the maximum likelihood estimators of the parameters cannot be expressed in a closed form, we suggest an em algorithm to compute the same. Extensive simulations are carried out to conclude that the estimators perform efficiently under random censoring. (C) 2010 Elsevier B.V. All rights reserved.
This paper investigates the effectiveness of the DAem (Deterministic Annealing em) algorithm in acoustic modeling for speaker and speech recognition. Although the em algorithm has been widely used to approximate the M...
详细信息
This paper investigates the effectiveness of the DAem (Deterministic Annealing em) algorithm in acoustic modeling for speaker and speech recognition. Although the em algorithm has been widely used to approximate the ML estimates, it has the problem of initialization dependence. To relax this problem, the DAem algorithm has been proposed and confirmed the effectiveness in artificial small tasks. In this paper, we applied the DAem algorithm to practical speech recognition tasks: speaker recognition based on GMMs and continuous speech recognition based on HMMs. Experimental results show that the DAem algorithm can improve the recognition performance as compared to the standard em algorithm with conventional initialization algorithms, especially in the flat start training for continuous speech recognition.
This paper proposes an iterative maximum a posteriori probability (MAP) receiver for multiple-input-multiple-output (MIMO) and orthogonal frequency-division multiplexing (OFDM) mobile communications. For exploiting th...
详细信息
This paper proposes an iterative maximum a posteriori probability (MAP) receiver for multiple-input-multiple-output (MIMO) and orthogonal frequency-division multiplexing (OFDM) mobile communications. For exploiting the space, time, and frequency diversity, the low-density parity-check code (LDPC) is used as a channel coding with a built-in interleaver. The receiver employs the expectation maximization (em) algorithm so as to perform the MAP symbol detection with reasonable computational complexity. The minimum mean square error (MMSE), recursive least squares (RLS), and least mean square (LMS) algorithms are theoretically derived for the channel estimation within this framework. Furthermore, the proposed receiver performs a new scheme called backward symbol detection (BSD), in which the signal detection uses the channel impulse response that is estimated one OFDM symbol later. The advantage of BSD, which is explained from the viewpoint of the message passing algorithm, is that BSD can exploit information on the both precedent and subsequent OFDM symbols, similarly to RLS with smoothing and removing (SR-RLS) [25]. In comparison with SR-RLS, BSD reduces the complexity at the cost of packet error rate (PER) performance. Computer simulations show that the receiver employing RLS for the channel estimation outperforms the ones employing MMSE or LMS, and that BSD can improve the PER performance of the ones employing RLS or LMS.
em algorithms for multivariate normal mixture decomposition have been recently proposed in order to maximize the likelihood function in a constrained parameter space having no singularities and a reduced number of spu...
详细信息
em algorithms for multivariate normal mixture decomposition have been recently proposed in order to maximize the likelihood function in a constrained parameter space having no singularities and a reduced number of spurious local maxima. However, such approaches require some a priori information about the eigenvalues of the covariance matrices. The behavior of the em algorithm near a degenerated solution is investigated. The obtained theoretical results would suggest a new kind of constraint based on the dissimilarity between two consecutive updates of the eigenvalues of each covariance matrix. The performances of such a "dynamic" constraint are evaluated on the grounds of some numerical experiments. (C) 2010 Elsevier B.V. All rights reserved.
This paper describes how relational graph matching can be effected using the expectation and maximisation algorithm. According to this viewpoint, matching is realised as a two-step iterative em-like process. Firstly, ...
详细信息
This paper describes how relational graph matching can be effected using the expectation and maximisation algorithm. According to this viewpoint, matching is realised as a two-step iterative em-like process. Firstly, updated symbolic matches are located so as to minimise the divergence between the model and data graphs. Secondly, with the updated matches to hand probabilities describing the affinity between nodes in the model and data graphs may be computed. The probability distributions underpinning this study are computed using a simple model of uniform matching errors. As a result, the expected likelihood function is defined over a family of exponential distributions of Hamming distance. We evaluate our matching method and offer comparison with both mean-field annealing and quadratic assignment. (C) 1998 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
暂无评论