In this article we introduce two procedures for variable selection in cluster analysis and classification rules. One is mainly aimed at detecting the ''noisy'' noninformative variables, while the other...
详细信息
In this article we introduce two procedures for variable selection in cluster analysis and classification rules. One is mainly aimed at detecting the ''noisy'' noninformative variables, while the other also deals with multicolinearity and general dependence. Both methods are designed to be used after a ''satisfactory'' grouping procedure has been carried out. A forward-backward algorithm is proposed to make such procedures feasible in large datasets. A small simulation is performed and some real data examples are analyzed.
Extending previous work on asset-based style factor models, this paper proposes a model that allows for the presence of structural breaks in hedge fund return series. We consider a Bayesian approach to detecting struc...
详细信息
Extending previous work on asset-based style factor models, this paper proposes a model that allows for the presence of structural breaks in hedge fund return series. We consider a Bayesian approach to detecting structural breaks occurring at unknown times and identifying relevant risk factors to explain the monthly return variation. Exact and efficient Bayesian inference for the unknown number and positions of the breaks is performed by using filtering recursions similar to those of the forward-backward algorithm. Existing methods of testing for Structural breaks are also Used for comparison. We investigate the presence of structural breaks in several hedge fund indices: our results are consistent with market events and episodes that Caused Substantial volatility in hedge fund returns during the last decade. (C) 2008 Elsevier B.V. All rights reserved.
A Markov-modulated Poisson process is a Poisson process whose intensity varies according to a Markov process. We present a novel technique for simulating from the exact distribution of a continuous time Markov chain o...
详细信息
A Markov-modulated Poisson process is a Poisson process whose intensity varies according to a Markov process. We present a novel technique for simulating from the exact distribution of a continuous time Markov chain over an interval given the start and end states and the infinitesimal generator, and we use this to create a Gibbs sampler which samples from the exact distribution of the hidden Markov chain in a Markov-modulated Poisson process. We apply the Gibbs sampler to modelling the occurrence of a rare DNA motif (the Chi site) and to inferring regions of the genome with evidence of high or low intensities for occurrences of this site.
We demonstrate how to perform direct simulation from the posterior distribution of a class of multiple changepoint models where the number of changepoints is unknown. The class of models assumes independence between t...
详细信息
We demonstrate how to perform direct simulation from the posterior distribution of a class of multiple changepoint models where the number of changepoints is unknown. The class of models assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints. This approach is based on the use of recursions, and is related to work on product partition models. The computational complexity of the approach is quadratic in the number of observations, but an approximate version, which introduces negligible error, and whose computational cost is roughly linear in the number of observations, is also possible. Our approach can be useful, for example within an MCMC algorithm, even when the independence assumptions do not hold. We demonstrate our approach on coal-mining disaster data and on well-log data. Our method can cope with a range of models, and exact simulation from the posterior distribution is possible in a matter of minutes.
Prediction of transmembrane (TM) segments of amino acid sequences of membrane proteins is a well-known and very important problem. The accuracy of its solution can be improved for approaches that do not use a homology...
详细信息
Prediction of transmembrane (TM) segments of amino acid sequences of membrane proteins is a well-known and very important problem. The accuracy of its solution can be improved for approaches that do not use a homology search in an additional data bank. There is a lack of tested data in this area of research, because information on the structure of membrane proteins is scarce. In this work we created a test sample of structural alignments for membrane proteins. The TM segments of these proteins were mapped according to aligned 3D structures resolved for these proteins. A method for predicting TM segments in an alignment was developed on the basis of the forward-backward algorithm from the HMM theory. This method allows a user not only to predict TM segments, but also to create a probabilistic membrane profile, which can be employed in multiple alignment procedures taking the secondary structure of proteins into account. The method was implemented in a computer program available at http://***/fwdbck/. It provides better results than the MEMSAT method, which is nearly the only tool predicting TM segments in multiple alignments, without a homology search.
We show that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties. This formulation makes it possible to...
详细信息
We show that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties. This formulation makes it possible to derive existence, uniqueness, characterization, and stability results in a unified and standardized fashion for a large class of apparently disparate problems. Recent results on monotone operator splitting methods are applied to establish the convergence of a forward-backward algorithm to solve the generic problem. In turn, we recover, extend, and provide a simplified analysis for a variety of existing iterative methods. Applications to geometry/texture image decomposition schemes are also discussed. A novelty of our framework is to use extensively the notion of a proximity operator, which was introduced by Moreau in the 1960s.
Models that combine Markovian states with implicit geometric state occupancy distributions and semi-Markovian states with explicit state occupancy distributions, are investigated. This type of model retains the flexib...
详细信息
Models that combine Markovian states with implicit geometric state occupancy distributions and semi-Markovian states with explicit state occupancy distributions, are investigated. This type of model retains the flexibility of hidden semi-Markov chains for the modeling of short or medium size homogeneous zones along sequences but also enables the modeling of long zones with Markovian states. The forward-backward algorithm, which in particular enables to implement efficiently the E-step of the EM algorithm, and the Viterbi algorithm for the restoration of the most likely state sequence are derived. It is also shown that macro-states, i.e. series-parallel networks of states with common observation distribution, are not a valid alternative to semi-Markovian states but may be useful at a more macroscopic level to combine Markovian states with semi-Markovian states. This statistical modeling approach is illustrated by the analysis of branching and flowering patterns in plants. (c) 2004 Elsevier B.V. All rights reserved.
We consider regression models where the underlying functional relationship between the response and the explanatory variable is modeled as independent linear regressions on disjoint segments. We present an algorithm f...
详细信息
We consider regression models where the underlying functional relationship between the response and the explanatory variable is modeled as independent linear regressions on disjoint segments. We present an algorithm for perfect simulation from the posterior distribution of such a model, even allowing for an unknown number of segments and an unknown model order for the linear regressions within each segment. The algorithm is simple, can scale well to large data sets, and avoids the problem of diagnosing convergence that is present with Monte Carlo Markov Chain (MCMC) approaches to this problem. We demonstrate our algorithm on standard denoising problems, on a piecewise constant AR model, and on a speech segmentation problem.
This work addresses the mitigation of channel errors by means of efficient minimum mean-square-error (MMSE) estimation. Although powerful model-based implementations have been recently proposed, the computational burd...
详细信息
This work addresses the mitigation of channel errors by means of efficient minimum mean-square-error (MMSE) estimation. Although powerful model-based implementations have been recently proposed, the computational burden involved can make them impractical. We propose two new approaches that maintain a good level of performance with a low computational complexity. These approaches keep the simple structure and complexity of a raw MMSE estimation, although they enhance it with additional source a priori knowledge. The proposed techniques are built on a distributed speech recognition system. Different degrees of tradeoff between recognition performance and computational complexity are obtained.
We demonstrate how to perform direct simulation for discrete mixture models. The approach is based on directly calculating the posterior distribution using a set of recursions which are similar to those of the forward...
详细信息
We demonstrate how to perform direct simulation for discrete mixture models. The approach is based on directly calculating the posterior distribution using a set of recursions which are similar to those of the forward-backward algorithm. Our approach is more practicable than existing perfect simulation methods for mixtures. For example, we analyse 1096 observations from a 2 component Poisson mixture, and 240 observations under a 3 component Poisson mixture ( with unknown mixture proportions and Poisson means in each case). Simulating samples of 10,000 perfect realisations took about 17 minutes and an hour respectively on a 900 MHz ultraSPARC computer. Our method can also be used to perform perfect simulation from Markov-dependent mixture models. A byproduct of our approach is that the evidence of our assumed models can be calculated, which enables different models to be compared.
暂无评论