Psychiatric studies of suicide provide fundamental insights on the evolution of severe psychopathologies, and contribute to the development of early treatment interventions. Our focus is on modelling different traits ...
详细信息
Psychiatric studies of suicide provide fundamental insights on the evolution of severe psychopathologies, and contribute to the development of early treatment interventions. Our focus is on modelling different traits of psychosis and their interconnections, focusing on a case study on suicide attempt survivors. Such aspects are recorded via multivariate categorical data, involving a large numbers of items for multiple subjects. Current methods for multivariate categorical data-such as penalized log-linear models and latent structure analysis-are either limited to low-dimensional settings or include parameters with difficult interpretation. Motivated by this application, this article proposes a new class of approaches, which we refer to as Mixture of loglinearmodels (MILLS). Combining latent class analysis and log-linear models, MILLS defines a novel Bayesian approach to model complex multivariate categorical data with flexibility and interpretability, providing interesting insights on the relationship between psychotic diseases and psychological aspects in suicide attempt survivors.
In this work we investigate multipartition models, the subset of log-linear models for which one can perform the classical iterative proportional scaling (IPS) algorithm to numerically compute the maximum likelihood e...
详细信息
In this work we investigate multipartition models, the subset of log-linear models for which one can perform the classical iterative proportional scaling (IPS) algorithm to numerically compute the maximum likelihood estimate (MLE). Multipartition models include families of models such as hierarchical models and balanced, stratified staged trees. We define a sufficient condition, called the Generalized Running Intersection Property (GRIP), on the matrix representation of a multipartition model under which the classical IPS algorithm produces the exact MLE in one cycle. In this case, the MLE is a rational function of the data. Additionally we connect the GRIP to the toric fiber product and to previous results for hierarchical models and balanced, stratified staged trees. This leads to a characterization of balanced, stratified staged trees in terms of the GRIP.
log-linear models for contingency tables are a key tool for the study of categorical inequalities in sociology. However, the conventional approach to model selection and specification suffers from at least two limitat...
详细信息
log-linear models for contingency tables are a key tool for the study of categorical inequalities in sociology. However, the conventional approach to model selection and specification suffers from at least two limitations: reliance on oftentimes equivocal diagnostics yielded by fit statistics, and the inability to identify patterns of association not covered by model candidates. In this article, we propose an application of Lasso regularization that addresses the aforementioned limitations. We evaluate our method through a Monte Carlo experiment and an empirical study of educational assortative mating in Chile, 1990-2015. Results demonstrate that our approach has the virtue, relative to ad hoc specification searches, of offering a principled statistical criterion to inductively select a model. Importantly, we show that in situations where conventional fit statistics provide conflicting diagnostics, our Lasso-based approach is consistent in its model choice, yielding solutions that are both predictive and parsimonious.
We establish connections between invariant theory and maximum likelihood estimation for discrete statistical models. We show that norm minimization over a torus orbit is equivalent to maximum likelihood estimation in ...
详细信息
We establish connections between invariant theory and maximum likelihood estimation for discrete statistical models. We show that norm minimization over a torus orbit is equivalent to maximum likelihood estimation in log-linear models. We use notions of stability under a torus action to characterize the existence of the maximum likelihood estimate, and discuss connections to scaling algorithms.
log-linear models offer a detailed characterization of the association between categorical variables, but the breadth of their outputs is difficult to grasp because of the large number of parameters these models entai...
详细信息
log-linear models offer a detailed characterization of the association between categorical variables, but the breadth of their outputs is difficult to grasp because of the large number of parameters these models entail. Revisiting seminal findings and data from sociological work on social mobility, the author illustrates the use of heatmaps as a visualization technique to convey the complex patterns of association captured by log-linear models. In particular, turning log odds ratios derived from a model's predicted counts into heatmaps makes it possible to summarize large amounts of information and facilitates comparison across models' outcomes.
In square contingency tables, analysis of agreement between row and column classifications is of interest. For nominal categories, kappa co- efficient is used to summarize the degree of agreement between two raters. N...
详细信息
In square contingency tables, analysis of agreement between row and column classifications is of interest. For nominal categories, kappa co- efficient is used to summarize the degree of agreement between two raters. Numerous extensions and generalizations of kappa statistics have been pro- posed in the literature. In addition to the kappa coefficient, several authors use agreement in terms of log-linear models. This paper focuses on the approaches to study of inter-rater agreement for contingency tables with nominal or ordinal categories for multi-raters. In this article, we present a detailed overview of agreement studies and illustrate use of the approaches in the evaluation agreement over three numerical examples.
Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To ...
详细信息
Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk measures focus on sample unique cells in these tables and adopt parametric log-linear models as the standard statistical tools for the problem. Such models often have to deal with large and extremely sparse tables that pose a number of challenges to risk estimation. This paper proposes to overcome these problems by studying nonparametric alternatives based on Dirichlet process random effects. The main finding is that the inclusion of such random effects allows us to reduce considerably the number of fixed effects required to achieve reliable risk estimates. This is studied on applications to real data, suggesting, in particular, that our mixed models with main effects only produce roughly equivalent estimates compared to the all two-way interactions models, and are effective in defusing potential shortcomings of traditional log-linear models. This paper adopts a fully Bayesian approach that accounts for all sources of uncertainty, including that about the population frequencies, and supplies unconditional (posterior) variances and credible intervals.
Temporal trait of crashes has huge impact on road crash occurrence and a large proportion of research have considered different time periods to determine the causes and features of crash occurrence or frequency. Compa...
详细信息
Temporal trait of crashes has huge impact on road crash occurrence and a large proportion of research have considered different time periods to determine the causes and features of crash occurrence or frequency. Compared with other safety studies based on a single time interval, considerably less research has relied on the use of multiple time units, especially for the time intervals of less than one year. The research aims to fill the gap by investigating the temporal distribution of crash counts using multiple time spans including hour, weekday and month. To illustrate the most accurate results possible, both the Chi-square test and Cochran-Mantel-Haenzel tests were employed to explore the independence of various time units based on two-way and three-way contingency tables. Eight contingency table models were developed which can be classified into four groups including Complete Independence, Joint Independence, Conditional Independence and Homogeneous Association. Finally, a set of evaluation criteria were utilized for evaluation of the model performance. The results revealed the significant association existence in all time variables (hour, weekday, month) and the model with both main and all interactive effects of time variables provides best prediction performance. Also, the findings showed that Hour 18, weekdays 1, 6, 7 (Friday and Weekends), and month 8 (August) have the largest number of crash occurrences. It is suggested that both main and interactive effects of time variables should be included for model development, which otherwise might yield misleading information. It is anticipated that research results will benefit the safety professionals with better understanding of the temporal patterns of crashes with different time periods and allow the safety administrators to allocate the safety resources.
Markov chain Monte Carlo (MCMC) allows one to generate dependent replicates from a posterior distribution for effectively any Bayesian hierarchical model. However, MCMC can produce a significant computational burden. ...
详细信息
Markov chain Monte Carlo (MCMC) allows one to generate dependent replicates from a posterior distribution for effectively any Bayesian hierarchical model. However, MCMC can produce a significant computational burden. This motivates us to consider finding expressions of the posterior distribution that are computationally straightforward to obtain independent replicates from directly. We focus on a broad class of Bayesian hierarchical models for spatially dependent data, which are often modeled via a latent Gaussian process (LGP). First, we derive a new class of distributions referred to as the generalized conjugate multivariate (GCM) distribution. The GCM distribution's theoretical development follows that of the conjugate multivariate (CM) distribution with two main differences: the GCM allows for latent Gaussian process assumptions, and the GCM explicitly accounts for hyperparameters through marginalization. The development of GCM is needed to obtain independent replicates directly from the exact posterior distribution, which has an efficient regression form. Hence, we refer to our method as Exact Posterior Regression (EPR). Simulation studies with weakly stationary spatial processes and spatial basis function expansions are provided. We provide an analysis of poverty incidence from the U.S. Census Bureau, and an analysis of high-dimensional remote sensing data. Supplementary materials for this article are available online.
Poisson log-linear models are ubiquitous in many applications, and one of the most popular approaches for parametric count regression. In the Bayesian context, however, there are no sufficient specific computational t...
详细信息
Poisson log-linear models are ubiquitous in many applications, and one of the most popular approaches for parametric count regression. In the Bayesian context, however, there are no sufficient specific computational tools for efficient sampling from the posterior distribution of parameters, and standard algorithms, such as random walk Metropolis-Hastings or Hamiltonian Monte Carlo algorithms, are typically used. Herein, we developed an efficient Metropolis-Hastings algorithm and importance sampler to simulate from the posterior distribution of the parameters of Poisson log-linear models under conditional Gaussian priors with superior performance with respect to the state-of-the-art alternatives. The key for both algorithms is the introduction of a proposal density based on a Gaussian approximation of the posterior distribution of parameters. Specifically, our result leverages the negative binomial approximation of the Poisson likelihood and the successful Polya-gamma data augmentation scheme. Via simulation, we obtained that the time per independent sample of the proposed samplers is competitive with that obtained using the successful Hamiltonian Monte Carlo sampling, with the Metropolis-Hastings showing superior performance in all scenarios considered. Supplementary materials for this article are available online.
暂无评论