Basu and Basu (Statistica Sinica 8:841-860, 1998) have proposed an empty cell penalty for the minimum power-divergence estimators which can lead to improvements in the small sample properties of these estimators. In t...
详细信息
Basu and Basu (Statistica Sinica 8:841-860, 1998) have proposed an empty cell penalty for the minimum power-divergence estimators which can lead to improvements in the small sample properties of these estimators. In this paper, we study the small and moderate sample performances of the ordinary and penalized minimum power-divergence estimators in terms of efficiency and robustness for the log-linear models in two-way contingency tables under the assumptions of multinomial sampling. Calculations made by enumerating all possible sample combinations show that the penalized estimators are competitive with the ordinary estimators for the moderate samples and definitely better for the smallest sample considered for both efficiency and robustness under the considered models. The results also reveal that the bigger the main effects the more need for penalization.
A program of routine hepatitis A + B vaccination in preadolescents was introduced in 1998 in Catalonia, a region situated in the northeast of Spain. The objective of this study was to quantify the reduction in the inc...
详细信息
A program of routine hepatitis A + B vaccination in preadolescents was introduced in 1998 in Catalonia, a region situated in the northeast of Spain. The objective of this study was to quantify the reduction in the incidence of hepatitis A in order to differentiate the natural reduction of the incidence of hepatitis A from that produced due to the vaccination programme and to predict the evolution of the disease in forthcoming years. A generalized linear model (GLM) using negative binomial regression was used to estimate the incidence rates of hepatitis A in Catalonia by year, age group and vaccination. Interaction of the vaccine reduced cases by 5.5 by year (p-value < 0.001), but there was a significant interaction between the year of report and vaccination that smoothed this reduction (p-value < 0.001). The reduction was not equal in all age groups, being greater in the 12-18 years age group. which fell from a mean rate of 8.15 per 100,000 person/years in the pre-vaccination period (1992-1998) to 1.4 in the vaccination period (1999-2005). The model predicts the evolution accurately for the group of vaccinated subjects. Negative binomial regression is more appropriate than Poisson regression when observed variance exceeds the observed mean (overdispersed Count data), call cause a variable apparently Contribute more on the model of what really makes it. (C) 2008 Elsevier Ltd. All rights reserved.
We propose using latent class analysis as an alternative to log-linear analysis for the multiple imputation of incomplete categorical data. Similar to log-linear models, latent class models can be used to describe com...
详细信息
We propose using latent class analysis as an alternative to log-linear analysis for the multiple imputation of incomplete categorical data. Similar to log-linear models, latent class models can be used to describe complex association structures between the variables used in the imputation model. However, unlike log-linear models, latent class models can be used to build large imputation models containing more than a few categorical variables. To obtain imputations reflecting uncertainty about the unknown model parameters, we use a nonparametric bootstrap procedure as an alternative to the more common full Bayesian approach. The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples. fit a simulated data example, we compare the new method to well-established methods such as maximum likelihood estimation with incomplete data and multiple imputation using a saturated log-linear model. This example shows that the proposed method yields unbiased parameter estimates and standard errors. The second example concerns an application using a typical social sciences data set. It contains 79 variables that are all included in the imputation model. The proposed method in especially useful for such large data sets because standard methods for dealing with missing data in categorical variables break down when the number of variables is so large.
Homophily, the tendency for similar individuals to associate, is one of the most robust findings in social science. Despite this robustness, we have less information about how personal characteristics relate to differ...
详细信息
Homophily, the tendency for similar individuals to associate, is one of the most robust findings in social science. Despite this robustness, we have less information about how personal characteristics relate to differences in the strength of homophily. Nor do we know much about the impact of personal characteristics on judgments of relative dissimilarity. The present study compares the strength of age, religious, and educational homophily for male and female non-kin ties using network data from the 1985 General Social Survey. It also compares the patterning of ties among dissimilar alters for both sexes. The results of this exploratory effort indicate that males and females are almost equally homophilous, although religious homophily exerts a stronger influence on females than males. Males and females do, however, differ in their tendency to associate with certain types of dissimilar alters. Education is essentially uniform for both sexes, religious difference is more important for females than males, and those over sixty or under thirty are less different from the middle categories of age for females than for males. The results suggest that males are able to bridge larger areas of social space in their non-kin interpersonal networks and likely accumulate greater social capital as a consequence. (c) 2007 Elsevier Inc. All rights reserved.
This article hammers out the estimation of a fixed effects dynamic panel data model extended to include either spatial error autocorrelation or a spatially lagged dependent variable. To overcome the inconsistencies as...
详细信息
This article hammers out the estimation of a fixed effects dynamic panel data model extended to include either spatial error autocorrelation or a spatially lagged dependent variable. To overcome the inconsistencies associated with the traditional least-squares dummy estimator, the models are first-differenced to eliminate the fixed effects and then the unconditional likelihood function is derived taking into account the density function of the first-differenced observations on each spatial unit. When exogenous variables are omitted, the exact likelihood function is found to exist. When exogenous variables are included, the pre-sample values of these variables and thus the likelihood function must be approximated. Two leading cases are considered. the Bhargava and Sargan approximation and the Nerlove and Balestra approximation. As an application, a dynamic demand model for cigarettes is estimated based on panel data from 46 U.S. states over the period from 1963 to 7992.
Toric models have been recently introduced in the analysis of statistical models for categorical data. The main improvement with respect to classical log-linear models is shown to be a simple representation of structu...
详细信息
Toric models have been recently introduced in the analysis of statistical models for categorical data. The main improvement with respect to classical log-linear models is shown to be a simple representation of structural zeros. In this paper we analyze the geometry of toric models, showing that a toric model is the disjoint union of a number of log-linear models. Moreover, we discuss the connections between the parametric and algebraic representations. The notion of Hilbert basis of a lattice is proved to allow a special representation among all possible parametrizations.
Multiple imputation under the multivariate normality assumption has often been regarded as a viable model-based approach in dealing with incomplete continuous data in the last two decades. A situation where the measur...
详细信息
Multiple imputation under the multivariate normality assumption has often been regarded as a viable model-based approach in dealing with incomplete continuous data in the last two decades. A situation where the measurements are taken on a continuous scale with an ultimate interest in dichotomized versions through discipline-specific thresholds is not uncommon in applied research, especially in medical and social sciences. In practice, researchers generally tend to impute missing values for continuous outcomes under a Gaussian imputation model, and then dichotomize them via commonly-accepted cut-off points. An alternative strategy is creating multiply imputed data sets after dichotomization under a log-linear imputation model that uses a saturated multinomial structure. In this work, the performances of the two imputation methods were examined on a fairly wide range of simulated incomplete data sets that exhibit varying distributional characteristics such as skewness and multimodality. Behavior of efficiency and accuracy measures was explored to determine the extent to which the procedures work properly. The conclusion drawn is that dichotomization before carrying out a log-linear imputation should be the preferred approach except for a few special cases. I recommend that researchers use the atypical second strategy whenever the interest centers on binary quantities that are obtained through underlying continuous measurements. A possible explanation is that erratic/idiosyncratic aspects that are not accommodated by a Gaussian model are probably transformed into better-behaving discrete trends in this particular missing-data setting. This premise outweighs the assertion that continuous variables inherently carry more information, leading to a counter-intuitive, but potentially useful result for practitioners.
Much research in environmental epidemiology relies on aggregate-level information on exposure to potentially toxic substances and on relevant covariates. We compare the use of additive (linear) and multiplicative (log...
详细信息
Much research in environmental epidemiology relies on aggregate-level information on exposure to potentially toxic substances and on relevant covariates. We compare the use of additive (linear) and multiplicative (log-linear) regression models for the analysis of such data. We illustrate how both additive and multiplicative models can be fit to aggregate-level data sets in which disease incidence is the dependent variable, and contrast these results with similar models fitted to individual-level data. We find (1) that for aggregate-level data, multiplicative models are more likely than additive models to introduce bias into the estimation of rates, an effect not found with individual-level data;and (2) that under many circumstances multiplicative models reduce the precision of the estimates, an effect also not found in individual-level models. For both additive and multiplicative models of aggregate-level data, we find that, in the presence of covariates, narrow confidence interval are obtained only when two or more antecedent factors are strongly related to the measured covariate and/or the exposure of primary substantive interest. We conclude that the equivalency of fitting additive versus multiplicative models in studies with individual-level binary data does not carry over to studies that analyze aggregate-level information. For aggregate data, we strongly recommend use of additive models.
This article presents the empirical Bayes method for estimation of the transition probabilities of a generalized finite stationary Markov chain whose ith state is a multi-way contingency table. We use a log-linear mod...
详细信息
This article presents the empirical Bayes method for estimation of the transition probabilities of a generalized finite stationary Markov chain whose ith state is a multi-way contingency table. We use a log-linear model to describe the relationship between factors in each state. The prior knowledge about the main effects and interactions will be described by a conjugate prior. Following the Bayesian paradigm, the Bayes and empirical Bayes estimators relative to various loss functions are obtained. These procedures are illustrated by a real example. Finally, asymptotic normality of the empirical Bayes estimators are established.
暂无评论