The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding ...
详细信息
The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like expectationmaximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses ( like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to suboptimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global
A reassessment of the dynamic characteristics of the 542 m cable-stayed Bayview Bridge in Quincy, Illinois, is presented using a newly developed output-only system identification technique. The technique is applied to...
详细信息
A reassessment of the dynamic characteristics of the 542 m cable-stayed Bayview Bridge in Quincy, Illinois, is presented using a newly developed output-only system identification technique. The technique is applied to an extensive set of ambient vibration response data acquired from the bridge in 1987. Vertical, torsional and transverse modal frequencies of the deck are identified, and uncertainty in damping values are estimated using an automated procedure on several redundant measurements at four locations. Important practical implementation issues associated with the implementation of the procedure and selection of algorithm design parameters for stochastic subspace identification techniques are discussed. An overall mean and standard deviation of damping of 1.0 +/- 0.8% is estimated considering all identified vertical, torsional and transverse modes in the 0-2 Hz band. The mean damping for the fundamental vertical mode (0.37 Hz) is identified as 1.4 +/- 0.5%, and for the first coupled torsion-transverse mode (0.56 Hz) is identified as 1.1 +/- 0.8%. Variability in the damping estimates is shown to decrease as estimated modal RMS acceleration levels increase. Standard deviations on estimated damping range from 0.05% to 2%. The results are shown to be a substantial improvement in the evaluation of damping compared to earlier spectral analysis conducted on the same data set. Copyright (c) 2005 John Wiley & Sons, Ltd.
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discre...
详细信息
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and latent vectors in a higher level are acoustic-unit independent. An expectationmaximization (EM) algorithm is derived for maximum likelihood estimation of the model. We show through a set of experiments to verify the potential of the GFA-HMM as an alternative acoustic modeling technique. In one experiment, by varying the latent dimension and the number of mixture components in the latent spaces, the GFA-HMM attained more compact representation than the standard HMM. In other experiments with varies noise types and speaking styles, the GFA-HMM was able to have (statistically significant) improvement with respect to the standard HMM, (c) 2005 Elsevier B.V. All rights reserved.
In spatial clustering, in addition to the object similarity in the normal attribute space, similarity in the spatial space needs to be considered and objects assigned to the same cluster should usually be close to one...
详细信息
In spatial clustering, in addition to the object similarity in the normal attribute space, similarity in the spatial space needs to be considered and objects assigned to the same cluster should usually be close to one another in the spatial space. The conventional expectationmaximization (EM) algorithm is not suited for spatial clustering because it does not consider spatial information. Although neighborhood EM (NEM) algorithm incorporates a spatial penalty term to the criterion function., it involves much more iterations in every E-step. In this paper, we propose a Hybrid EM (HEM) approach that combines EM and NEM. Its computational complexity for every pass is between EM and NEM. Experiments also show that its clustering quality is better than EM and comparable to NEM.
We compared seven different tagging single-nucleotide polymorphism ( SNP) programs in 10 regions with varied amounts of linkage disequilibrium (LD) and physical distance. We used the Collaborative Studies on the Genet...
详细信息
We compared seven different tagging single-nucleotide polymorphism ( SNP) programs in 10 regions with varied amounts of linkage disequilibrium (LD) and physical distance. We used the Collaborative Studies on the Genetics of Alcoholism dataset, part of the Genetic Analysis Workshop 14. We show that in regions with moderate to strong LD these programs are relatively consistent, despite different parameters and methods. In addition, we compared the selected SNPs in a multipoint linkage analysis for one region with strong LD. As the number of selected SNPs increased, the LOD score, mean information content, and type I error also increased.
The syntagmatic paradigmatic model is a distributed, memory-based account of verbal processing. Built on a Bayesian interpretation of string edit theory, it characterizes the control of verbal cognition as the retriev...
详细信息
The syntagmatic paradigmatic model is a distributed, memory-based account of verbal processing. Built on a Bayesian interpretation of string edit theory, it characterizes the control of verbal cognition as the retrieval of sets of syntagmatic and paradigmatic constraints from sequential and relational long-term memory and the resolution of these constraints in working memory. Lexical information is extracted directly from text using a version of the expectation maximization algorithm. In this article, the model is described and then illustrated on a number of phenomena, including sentence processing, semantic categorization and rating, short-term serial recall, and analogical and logical inference. Subsequently, the model is used to answer questions about a corpus of tennis news articles taken from the Internet. The model's success demonstrates that it is possible to extract propositional information from naturally occurring text without employing a grammar, defining a set of heuristics, or specifying a priori a set of semantic roles.
Applying the noisy channel model to search query spelling correction requires an error model and a language model. Typically, the error model relies on a weighted string edit distance measure. The weights can be learn...
详细信息
In this article, a cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set. The maximum value of this index, called the PB...
详细信息
In this article, a cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is defined as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies-Bouldin index, Dunn's index and the Xie-Beni index, are provided for several artificial and real-life data sets. (C) 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
A maximum a posteriori algorithm, which incorporates correlated magnetic resonance images into the processing of positron emission tomography reconstruction with the aim of improving image quality was developed. The l...
详细信息
A maximum a posteriori algorithm, which incorporates correlated magnetic resonance images into the processing of positron emission tomography reconstruction with the aim of improving image quality was developed. The line site map from MRI a priori is made up of a modified Markov random field or Canny edge detector with Gaussian smoothing filter. It is used in the MAP algorithm by a weighted line site method. We evaluate and compare the performance of these reconstruction methods. The results show that the Bayesian methods produce reconstructed images with less noise and better spatial resolution than those produced by the maximum likelihood-expectationmaximization method. (C) 2004 Elsevier Ltd. All rights reserved.
This paper presents a new identification technique for the extraction of modal parameters of structural systems subjected to base excitation. The technique uses output-only measurements of the structural response. A c...
详细信息
This paper presents a new identification technique for the extraction of modal parameters of structural systems subjected to base excitation. The technique uses output-only measurements of the structural response. A combined subspace-maximum likelihood algorithm is developed and applied to a three-degree-of-freedom simulation model. Five ensembles of synthetically generated input signals, representing varying input characteristics, are employed in Monte Carlo simulations to illustrate the applicability of the method. The technique is able to circumvent some of the difficulties arising from short data sets by employing the expectationmaximization (EM) algorithm to refine the subspace state estimates. This approach is motivated by successful application by previous authors on speech signals. Results indicate that, for certain system characteristics, more accurate pole estimates can be identified using the combined subspace-EM formulation. In general, the damping ratios of the system are difficult to identify accurately due to limitations on data set length. The applicability of the technique to structural vibration signals is illustrated through the identification of seismic response data from the Vincent Thomas Bridge. Copyright (C) 2003 John Wiley Sons, Ltd.
暂无评论