Motivation: Genes with indispensable functions are identified as essential;however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a...
详细信息
Motivation: Genes with indispensable functions are identified as essential;however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a new perspective of protein domains, the independent structural or functional units of a polypeptide chain. Results: To identify such essential domains, we have developed an expectation-maximization (EM) algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbial species and predicted 1879 domains to be essential in at least one species, ranging 10-23% in each species. The predicted essential domains were more conserved than either non-essential domains or essential genes. Comparing essential domains in prokaryotes and eukaryotes revealed an evolutionary distance consistent with that inferred fromribosomal RNA. When utilizing these essential domains to reproduce the annotation of essential genes, we received accurate results that suggest protein domains are more basic units for the essentiality of genes. Furthermore, we presented several examples to illustrate how the combination of essential and non-essential domains can lead to genes with divergent essentiality. In summary, we have described the first systematic analysis on gene essentiality on the level of domains.
This correspondence describes a procedure for determining the exact maximum likelihood (ML) estimates of the parameters of a harmonic series (i.e., the fundamental frequency, and the amplitude and phase of each harmon...
详细信息
This correspondence describes a procedure for determining the exact maximum likelihood (ML) estimates of the parameters of a harmonic series (i.e., the fundamental frequency, and the amplitude and phase of each harmonic). Existing ML methods are only approximate in the sense that terms present due to mixing between the harmonics are ignored;these terms asymptotically reduce to zero as the sample size increases to infinity. The note argues that these terms can be significant for short signal lengths. The application of the expectation-maximization algorithm results in an iterative procedure that converges to a stationary point on the true parameter likelihood surface. If global convergence results, this point yields the exact ML estimates. Simulation studies illustrate the advantages of the method when short data lengths are used.
It is well known that the convergence rate of the expectation-maximization (EM) algorithm can be faster than those of convention first-order iterative algorithms when the overlap in the given mixture is small. But thi...
详细信息
It is well known that the convergence rate of the expectation-maximization (EM) algorithm can be faster than those of convention first-order iterative algorithms when the overlap in the given mixture is small. But this argument has not been mathematically proved yet. This article studies this problem asymptotically in the setting of gaussian mixtures under the theoretical framework of Xu and Jordan (1996). It has been proved that the asymptotic convergence rate of the EM algorithm for gaussian mixtures locally around the true solution Theta* is o(e(0.5-epsilon)(Theta*)), where epsilon > 0 is an arbitrarily small number, o(x) means that it is a higher-order infinitesimal as x --> 0, and e(Theta*) is a measure of the average overlap of gaussians in the mixture. In other words, the large sample local convergence rare for the EM algorithm tends to be asymptotically superlinear when e(Theta*) tends to zero.
We introduce a novel way of performing independent component analysis using a constrained version of the expectation-maximization (EM) algorithm. The source distributions are modeled as D one-dimensional mixtures of g...
详细信息
We introduce a novel way of performing independent component analysis using a constrained version of the expectation-maximization (EM) algorithm. The source distributions are modeled as D one-dimensional mixtures of gaussians. The observed data are modeled as linear mixtures of the sources with additive, isotropic noise. This generative model is fit to the data using constrained EM. The simpler "soft-switching" approach is introduced, which uses only one parameter to decide on the sub- or supergaussian nature of the sources. We explain how our approach relates to independent factor analysis.
The article analyzes latent variable models with misclassified polytomous outcome variables. Modeling misclassification patterns imposes identification difficulties. The assumption of monotone misclassification is rea...
详细信息
The article analyzes latent variable models with misclassified polytomous outcome variables. Modeling misclassification patterns imposes identification difficulties. The assumption of monotone misclassification is reasonable in many social and behavioral science studies where data are based on interview questions. Modifications of the proposed monotone misclassification pattern could be considered. The maximum likelihood estimation is performed using an EM algorithm where the E-step is computed via simple Monte Carlo integration. The M-step is computed using an iterative procedure.
Slow convergence is observed in the EM algorithm for linear state-space models. We propose to circumvent the problem by applying any off-the-shelf quasi-Newton-type optimizer, which operates on the gradient of the log...
详细信息
Slow convergence is observed in the EM algorithm for linear state-space models. We propose to circumvent the problem by applying any off-the-shelf quasi-Newton-type optimizer, which operates on the gradient of the log-likelihood function. Such an algorithm is a practical alternative due to the fact that the exact gradient of the log-likelihood function can be computed by recycling components of the expectation-maximization (EM) algorithm. We demonstrate the efficiency of the proposed method in three relevant instances of the linear state-space model. In high signal-to-noise ratios, where EM is particularly prone to converge slowly, we show that gradient-based learning results in a sizable reduction of computation time.
The goal of hyperspectral unmixing is to decompose an electromagnetic spectral dataset measured over M spectral bands and T pixels into N constituent material spectra (or "endmembers") with corresponding spa...
详细信息
The goal of hyperspectral unmixing is to decompose an electromagnetic spectral dataset measured over M spectral bands and T pixels into N constituent material spectra (or "endmembers") with corresponding spatial abundances. In this paper, we propose a novel approach to hyperspectral unmixing based on loopy belief propagation (BP) that enables the exploitation of spectral coherence in the endmembers and spatial coherence in the abundances. In particular, we partition the factor graph into spectral coherence, spatial coherence, and bilinear subgraphs, and pass messages between them using a "turbo" approach. To perform message passing within the bilinear subgraph, we employ the bilinear generalized approximate message passing algorithm (BiG-AMP), a recently proposed belief-propagationbased approach to matrix factorization. Furthermore, we propose an expectation-maximization (EM) strategy to tune the prior parameters and a model-order selection strategy to select the number of materials N. Numerical experiments conducted with both synthetic and real-world data show favorable unmixing performance relative to existing methods.
Multivariate extensions of the Poisson distribution are plausible models for multivariate discrete data. The lack of estimation and inferential procedures reduces the applicability of such models. In this paper, an EM...
详细信息
Multivariate extensions of the Poisson distribution are plausible models for multivariate discrete data. The lack of estimation and inferential procedures reduces the applicability of such models. In this paper, an EM algorithm for Maximum Likelihood estimation of the parameters of the Multivariate Poisson distribution is described. The algorithm is based on the multivariate reduction technique that generates the Multivariate Poisson distribution. Illustrative examples are also provided. Extension to other models, generated via multivariate reduction, is discussed.
A fundamental problem in biological and machine vision is visual invariance: How are objects perceived to be the same despite transformations such as translations, rotations, and scaling? In this letter, we describe a...
详细信息
A fundamental problem in biological and machine vision is visual invariance: How are objects perceived to be the same despite transformations such as translations, rotations, and scaling? In this letter, we describe a new, unsupervised approach to learning invariances based on Lie group theory. Unlike traditional approaches that sacrifice information about transformations to achieve invariance, the Lie group approach explicitly models the effects of transformations in images. As a result, estimates of transformations are available for other purposes, such as pose estimation and visuomotor control. Previous approaches based on first-order Taylor series expansions of images can be regarded as special cases of the Lie group approach, which utilizes a matrix-exponential-based generative model of images and can handle arbitrarily large transformations. We present an unsupervised expectation-maximization algorithm for learning Lie transformation operators directly from image data containing examples of transformations. Our experimental results show that the Lie operators learned by the algorithm from an artificial data set containing six types of affine transformations closely match the analytically predicted affine operators. We then demonstrate that the algorithm can also recover novel transformation operators from natural image sequences. We conclude by showing that the learned operators can be used to both generate and estimate transformations in images, thereby providing a basis for achieving visual invariance.
Motivation: Researchers worldwide have generated a huge volume of genomic data, including thousands of genome-wide association studies (GWAS) and massive amounts of gene expression data from different tissues. How to ...
详细信息
Motivation: Researchers worldwide have generated a huge volume of genomic data, including thousands of genome-wide association studies (GWAS) and massive amounts of gene expression data from different tissues. How to perform a joint analysis of these data to gain new biological insights has become a critical step in understanding the etiology of complex diseases. Due to the polygenic architecture of complex diseases, the identification of risk genes remains challenging. Motivated by the shared risk genes found in complex diseases and tissue-specific gene expression patterns, we propose as an (E) under bar mpirical Bayes approach to integrating (P) under bar leiotropy and Tissue-(S) under bar pecific information (EPS) for prioritizing risk genes. Results: As demonstrated by extensive simulation studies, EPS greatly improves the power of identification for disease-risk genes. EPS enables rigorous hypothesis testing of pleiotropy and tissue-specific risk gene expression patterns. All of the model parameters can be adaptively estimated from the developed expectation-maximization (EM) algorithm. We applied EPS to the bipolar disorder and schizophrenia GWAS from the Psychiatric Genomics Consortium, along with the gene expression data for multiple tissues from the Genotype-Tissue Expression project. The results of the real data analysis demonstrate many advantages of EPS.
暂无评论