Cluster analysis faces two problems in high dimensions: the "curse of dimensionality" that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to ...
详细信息
Cluster analysis faces two problems in high dimensions: the "curse of dimensionality" that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of spike sorting for next-generation, high-channel-count neural probes. In this problem, only a small subset of features provides information about the cluster membership of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a "masked EM" algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data and to real-world high-channel-count spike sorting data.
Motivation: We have witnessed an enormous increase in ChIP-Seq data for histone modifications in the past few years. Discovering significant patterns in these data is an important problem for understanding biological ...
详细信息
Motivation: We have witnessed an enormous increase in ChIP-Seq data for histone modifications in the past few years. Discovering significant patterns in these data is an important problem for understanding biological mechanisms. Results: We propose probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our methods take into account signal magnitude, shape, strand orientation and shifts. We compare our methods with some current methods and demonstrate significant improvements, especially with sparse data. Besides pattern discovery and classification, probabilistic partitioning can serve other purposes in ChIP-Seq data analysis. Specifically, we exemplify its merits in the context of peak finding and partitioning of nucleosome positioning patterns in human promoters.
The article presents a letter to the editor in response to the article 'The EM algorithm and medical studies: a historical link' by X.L. Meng that was published in a 1997 issue.
The article presents a letter to the editor in response to the article 'The EM algorithm and medical studies: a historical link' by X.L. Meng that was published in a 1997 issue.
While retinal images (RI) assist in the diagnosis of various eye conditions and diseases such as glaucoma and diabetic retinopathy, their innate features including low contrast homogeneous and non-uniformly illuminate...
详细信息
ISBN:
(纸本)9781479928934
While retinal images (RI) assist in the diagnosis of various eye conditions and diseases such as glaucoma and diabetic retinopathy, their innate features including low contrast homogeneous and non-uniformly illuminated regions, present a particular challenge for retinal image registration (RIR). Recently, the hybrid similarity measure, expectationmaximization for Principal Component Analysis with Mutual Information (EMPCA-MI) has been proposed for RIR. This paper investigates incorporating various fixed and adaptive bin size selection strategies to estimate the probability distribution in the mutual information (MI) stage of EMPCA-MI, and analyses their corresponding effect upon RIR performance. Experimental results using a clinical mono-modal RI dataset confirms that adaptive bin size selection consistently provides both lower RIR errors and superior robustness compared to the empirically determined fixed bin sizes.
We present a unified framework to evaluate the error rate performance of wireless networks over generalized fading channels. In particular, we propose a new approach to represent different fading distributions by mixt...
详细信息
ISBN:
(纸本)9781479974702
We present a unified framework to evaluate the error rate performance of wireless networks over generalized fading channels. In particular, we propose a new approach to represent different fading distributions by mixture of Gamma distributions. The new approach relies on the expectation-maximization (EM) algorithm in conjunction with the so-called Newton-Raphson maximization algorithm. We show that our model provides similar performance to other existing state-of-art models in both accuracy and simplicity, where accuracy is analyzed by means of mean square error (MSE). In addition, we demonstrate that this algorithm may potentially approximate any fading channel, and thus we utilize it to model both composite and non-composite fading models. We derive novel closed form expression of the raw moments of a dual-hop fixed-gain cooperative network. We also study the effective capacity of the end-to-end SNR in such networks. Numerical simulation results are provided to corroborate the analytical findings.
In the context of satellite communications, random access methods can significantly increase throughput and reduce latency over the network. The recent random access methods are based on multi-user multiple access tra...
详细信息
ISBN:
(纸本)9781479958931
In the context of satellite communications, random access methods can significantly increase throughput and reduce latency over the network. The recent random access methods are based on multi-user multiple access transmission at the same time and frequency followed by iterative interference cancellation and decoding at the receiver. Generally, it is assumed that perfect knowledge of the interference is available at the receiver. In practice, the interference term has to be accurately estimated to avoid performance degradation. Several estimation techniques have been proposed lately in the case of superimposed signals. In this paper, we present an overview on existing channel estimation methods and we propose an improved channel estimation technique that combines estimation using an autocorrelation based method and the expectation-maximization algorithm, and uses pilot symbol assisted modulation to further improve the performance and achieve optimal interference cancellation.
Fraud activities have contributed to heavy losses suffered by telecommunication companies. In this paper, we attempt to use Gaussian mixed model, which is a probabilistic model normally used in speech recognition to i...
详细信息
Fraud activities have contributed to heavy losses suffered by telecommunication companies. In this paper, we attempt to use Gaussian mixed model, which is a probabilistic model normally used in speech recognition to identify fraud calls in the telecommunication industry. We look at several issues encountered when calculating the maximum likelihood estimates of the Gaussian mixed model using an expectationmaximization algorithm. Firstly, we look at a mechanism for the determination of the initial number of Gaussian components and the choice of the initial values of the algorithm using the kernel method. We show via simulation that the technique improves the performance of the algorithm. Secondly, we developed a procedure for determining the order of the Gaussian mixed model using the log-likelihood function and the Akaike information criteria. Finally, for illustration, we apply the improved algorithm to real telecommunication data. The modified method will pave the way to introduce a comprehensive method for detecting fraud calls in future work.
Areal interpolation transforms data for a variable of interest from a set of source zones to estimate the same variable's distribution over a set of target zones. One common practice has been to guide interpolatio...
详细信息
Areal interpolation transforms data for a variable of interest from a set of source zones to estimate the same variable's distribution over a set of target zones. One common practice has been to guide interpolation by using ancillary control zones that are related to the variable of interest's spatial distribution. This guidance typically involves using source zone data to estimate the density of the variable of interest within each control zone. This article introduces a novel approach to density estimation, the geographically weighted expectation-maximization (GWEM), which combines features of two previously used techniques, the expectation-maximization (EM) algorithm and geographically weighted regression. The EM algorithm provides a framework for incorporating proper constraints on data distributions, and using geographical weighting allows estimated control-zone density ratios to vary spatially. We assess the accuracy of GWEM by applying it with land use/land cover (LULC) ancillary data to population counts from a nationwide sample of 1980 U.S. census tract pairs. We find that GWEM generally is more accurate in this setting than several previously studied methods. Because target-density weighting (TDW)using 1970 tract densities to guide interpolationoutperforms GWEM in many cases, we also consider two GWEM-TDW hybrid approaches and find them to improve estimates substantially.
暂无评论