The Birnbaum-Saunders distribution is a useful model for describing fatigue and reliability data. This model allows us to relate the total time until the failure to some type of cumulative damage. The majority of the ...
详细信息
The Birnbaum-Saunders distribution is a useful model for describing fatigue and reliability data. This model allows us to relate the total time until the failure to some type of cumulative damage. The majority of the models based on the Birnbaum-Saunders distribution have assumed fixed-effects, and a few have been investigated for correlated data. In this work, we introduce Birnbaum-Saunders mixed models for censored data. Specifically, we estimate their parameters by means of the Gauss-Hermite quadrature approximation, carry out a residual analysis for these models, and conduct an application using real censored reliability data. This application illustrates the utility of a Birnbaum-Saunders random intercept model.
Objects in many application domains can be characterized as link-based data, having both network (graph) information as well as structured information describing the nodes. Discovery of frequent patterns in this setti...
详细信息
Objects in many application domains can be characterized as link-based data, having both network (graph) information as well as structured information describing the nodes. Discovery of frequent patterns in this setting is vulnerable to problems that cannot occur in pattern mining on conventional data without network information. While patterns may appear to reflect novel characteristics of a combination of graph and node information, they may be expected based on patterns that could be found using conventional data mining techniques. We introduce a significance measure that identifies patterns that are unexpected based on node attributes in isolation and neighbor correlations. A statistical log-linear model is extended for this purpose and the structural symmetry of the link-based data is accounted for. Eliminating insignificant results reduces the output quantity by orders of magnitude. Efficiency is achieved by designing the pattern mining algorithm as a hybrid of conventional pattern mining and graph data mining. We demonstrate effectiveness and efficiency of the approach for yeast and for movie data. (C) 2011 Elsevier B.V. All rights reserved.
Multivariate abundance data are commonly collected in ecology, and used to explore questions of "community composition"aEuro"how relative abundance of different taxa changes with environmental condition...
详细信息
Multivariate abundance data are commonly collected in ecology, and used to explore questions of "community composition"aEuro"how relative abundance of different taxa changes with environmental conditions. In this paper, we propose a log-linear marginal modeling approach for analyzing such compositional count data, via generalized estimating equations. This method exploits the multiplicative nature of log-linear models for counts, by reparameterizing models that describe marginal effects on mean abundance. This allows partitioning into "main effects" and compositional effects, which is appealing for interpretation. We apply the proposed approach to reanalyze compositional counts of benthic invertebrates from Delaware Bay, and data of invertebrate communities inhabiting Acacia plants in eastern Australia. In both cases we resort to a resampling approach to make inferences about regression parameters, because the number of clusters was not large compared to cluster size.
In this paper we study a new class of statistical models for contingency tables. We define this class of models through a subset of the binomial equations of the classical independence model. We prove that they are lo...
详细信息
In this paper we study a new class of statistical models for contingency tables. We define this class of models through a subset of the binomial equations of the classical independence model. We prove that they are log-linear and we use some notions from Algebraic Statistics to compute their sufficient statistic and their parametric representation. Moreover, we show how to compute maximum likelihood estimates and to perform exact inference through the Diaconis-Sturmfels algorithm. Examples show that these models can be useful in a wide range of applications.
Two methods of bootstrap resampling are discussed with log-linear models for count data. The first involves the resampling of observations and the second involves the resampling of Pearson residuals taking into accoun...
详细信息
Two methods of bootstrap resampling are discussed with log-linear models for count data. The first involves the resampling of observations and the second involves the resampling of Pearson residuals taking into account changes in the distribution of residuals associated with the expected values of counts. The use of both methods is illustrated on two data sets;one data set concerns the number of ear infections of swimmers related to whether they are frequent swimmers or not and three other variables, and the other data set concerns the number of visits to a doctor made in the last 2 weeks related to the age of subjects and 10 other variables. A third data set on the number of marine mammal interactions in different years and fishing areas is also used as an example. In this case only the second bootstrap method can be used because the nature of the data allows the bootstrap resampling of observations to produce sets of data that could not have occurred in practice. Simulation results indicate that the bootstrap results are slightly better than the results from a conventional analysis for the first data set, and much better than the results from a conventional analysis for the second data set, but a conventional analysis works well for the third data set while there are problems with bootstrap analyses.
We discuss a general application of categorical data analysis to mutations along the HIV genome. We consider a multidimensional table for several positions at the same time. Due to the complexity of the multidimension...
详细信息
We discuss a general application of categorical data analysis to mutations along the HIV genome. We consider a multidimensional table for several positions at the same time. Due to the complexity of the multidimensional table, we may collapse it by pooling some categories. However. the association between the remaining variables may not be the same as before collapsing. We discuss the collapsibility of tables and the change in the meaning of parameters after collapsing categories. We also address this problem with a log-linear model. We present a parameterization with the consensus output as the reference cell as is appropriate to explain genomic mutations in HIV. We also consider live null hypotheses and some classical methods to address them. We illustrate methods for six positions along the HIV genome, through consideration of all triples of positions. (C) 2007 Elsevier B.V. All rights reserved.
Bernoulli-based models such as Bernoulli mixtures or Bernoulli HMMs (BHMMs), have been successfully applied to several handwritten text recognition (HTR) tasks which range from character recognition to continuous and ...
详细信息
ISBN:
(纸本)9780769545202
Bernoulli-based models such as Bernoulli mixtures or Bernoulli HMMs (BHMMs), have been successfully applied to several handwritten text recognition (HTR) tasks which range from character recognition to continuous and isolated handwritten words. All these models belong to the generative model family and, hence, are usually trained by (joint) maximum likelihood estimation (MLE). Despite the good properties of the MLE criterion, there are better training criteria such as maximum mutual information (MMI). The MMI is a widespread criterion that is mainly employed to train discriminative models such as log-linear (or maximum entropy) models. Inspired by the Bernoulli mixture classifier, in this work a log-linear model for binary data is proposed, the so-called mixture of multi-class logistic regression. The proposed model is proved to be equivalent to the Bernoulli mixture classifier. In this way, we give a discriminative training framework for Bernoulli mixture models. The proposed discriminative training framework is applied to a well-known Indian digit recognition task.
In this paper, we develop a model that allows us to combine annual (incomplete) registration data with (auxiliary) census data. The result is a synthetic database that can be used to analyse the evolution of specific ...
详细信息
In this paper, we develop a model that allows us to combine annual (incomplete) registration data with (auxiliary) census data. The result is a synthetic database that can be used to analyse the evolution of specific migrant groups over time. For illustration, we model the evolution of ethnic interregional migration in England by age and sex from 1991 to 2007 by combining National Health Service registration data with 1991 and 2001 census data. This annual time series of detailed migration flows is useful for both planning and for understanding ethnic population redistribution. Furthermore, changes over time can be related to regions exhibiting, for example, high unemployment, high costs of living, or high immigrant concentrations. Copyright (C) 2009 John Wiley & Sons, Ltd.
暂无评论