Obtaining reliable estimates in small areas is a challenge because of the coverage and periodicity of data collection. Several techniques of small area estimation have been proposed to produce quality measures in smal...
详细信息
Obtaining reliable estimates in small areas is a challenge because of the coverage and periodicity of data collection. Several techniques of small area estimation have been proposed to produce quality measures in small areas, but few of them are focused on updating these estimates. By combining the attributes of the most recent versions of the structure-preserving estimation methods, this article proposes a new alternative to estimate and update cross-classified counts for small domains, when the variable of interest is not available in the census. The proposed methodology is used to obtain and up-date estimates of the incidence of poverty in 81 Costa Rican cantons for six postcensal years (2012-2017). As uncertainty measures, mean squared errors are estimated via parametric bootstrap, and the adequacy of the proposed method is assessed with a design-based simulation.
Biometrical sciences and disease diagnosis in particular, are often concerned with the analysis of associations for cross-classified data, for which distance association models give us a graphical interpretation for n...
详细信息
Biometrical sciences and disease diagnosis in particular, are often concerned with the analysis of associations for cross-classified data, for which distance association models give us a graphical interpretation for non-sparse matrices with a low number of categories. In this framework, usually binary exploratory and response variables are present, with analysis based on individual profiles being of great interest. For saturated models, we show the usual linear relationship for log-linear models is preserved in full dimension for the distance association parameterization. This enables a two-step procedure to facilitate the analysis and the interpretation of associations in terms of unfolding after the overall and main effects are removed. The proposed procedure can deal with cross-classified data for profiles by binary variables, and it is easy to implement using traditional statistical software. For disease diagnosis, the problems of a degenerate solution in the unfolding representation, and that of determining significant differences between the profile locations are addressed. A hypothesis test of independence based on odds ratio is considered. Furthermore, a procedure is proposed to determine the causes of the significance of the test, avoiding the problem of error propagation. The equivalence between a test for equality of odds ratio pairs and the test for equality of location for two profiles in the unfolding representation in the disease diagnosis is shown. The results have been applied to a real example on the diagnosis of coronary disease, relating the odds ratios with performance parameters of the diagnostic test.
Freed and Cann (2013) criticized our use of linearmodels to assess trends in the status of Hawaiian forest birds through time (Camp et al. 2009a, 2009b, 2010) by questioning our sampling scheme, whether we met model ...
详细信息
Freed and Cann (2013) criticized our use of linearmodels to assess trends in the status of Hawaiian forest birds through time (Camp et al. 2009a, 2009b, 2010) by questioning our sampling scheme, whether we met model assumptions, and whether we ignored short-term changes in the population time series. In the present paper, we address these concerns and reiterate that our results do not support the position of Freed and Cann (2013) that the forest birds in the Hakalau Forest National Wildlife Refuge (NWR) are declining, or that the federally listed endangered birds are showing signs of imminent collapse. On the contrary, our data indicate that the 21-year long-term trends for native birds in Hakalau Forest NWR are stable to increasing, especially in areas that have received active management.
We discuss the identification of pediatric cancer clusters in Florida between 2000 and 2010 using a penalized generalized linear model. More specifically, we introduce a Poisson model for the observed number of cases ...
详细信息
We discuss the identification of pediatric cancer clusters in Florida between 2000 and 2010 using a penalized generalized linear model. More specifically, we introduce a Poisson model for the observed number of cases on each of Florida's ZIP Code Tabulation Areas (ZCTA) and regularize the associated disease rate estimates using a generalized Lasso penalty. Our analysis suggests the presence of a number of pediatric cancer clusters during the period over study, with the largest ones being located around the cities of Jacksonville, Miami, Cape Coral/Fort Meyers, and Palm Beach.
This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoo...
详细信息
This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors. The application of IRT models for presmoothing yielded smaller standard error of equating than the log-linear presmoothing approach. When test data were generated using IRT models, IRT-based methods proved less biased than log-linear kernel equating. However, when data were simulated without IRT models, log-linear kernel equating showed less bias. Overall, IRT kernel equating shows great promise when equating mixed-format tests.
Recently, a subcopula-based asymmetric association measure was developed for the variables in two-way contingency tables. Here, we develop a fully Bayesian method to implement this measure, and examine its performance...
详细信息
Recently, a subcopula-based asymmetric association measure was developed for the variables in two-way contingency tables. Here, we develop a fully Bayesian method to implement this measure, and examine its performance using simulation data and several real data sets of colorectal cancer. We use coverage probabilities and lengths of the interval estimators to compare the Bayesian approach and a large-sample method of analysis. In simulation studies, we find that the Bayesian method outperforms the large-sample method on average, and provides either similar or improved results for the real data analyses.
We review the influential research carried out by Chris Skinner in the area of statistical disclosure control, and in particular quantifying the risk of re-identification in sample microdata from a random survey drawn...
详细信息
We review the influential research carried out by Chris Skinner in the area of statistical disclosure control, and in particular quantifying the risk of re-identification in sample microdata from a random survey drawn from a finite population. We use the sample microdata to infer population parameters when the population is unknown, and estimate the risk of re-identification based on the notion of population uniqueness using probabilistic modelling. We also introduce a new approach to measure the risk of re-identification for a subpopulation in a register that is not representative of the general population, for example a register of cancer patients. In addition, we can use the additional information from the register to measure the risk of re-identification for the sample microdata. This new approach was developed by the two authors and is published here for the first time. We demonstrate this approach in an application study based on UK census data where we can compare the estimated risk measures to the known truth.
Genetic association studies of child health outcomes often employ family-based study designs. One of the most popular family-based designs is the case-parent trio design that considers the smallest possible nuclear fa...
详细信息
Genetic association studies of child health outcomes often employ family-based study designs. One of the most popular family-based designs is the case-parent trio design that considers the smallest possible nuclear family consisting of two parents and their affected child. This trio design is particularly advantageous for studying relatively rare disorders because it is less prone to type 1 error inflation due to population stratification compared to population-based study designs (e.g., case-control studies). However, obtaining genetic data from both parents is difficult, from a practical perspective, and many large studies predominantly measure genetic variants in mother-child dyads. While some statistical methods for analyzing parent-child dyad data (most commonly involving mother-child pairs) exist, it is not clear if they provide the same advantage as trio methods in protecting against population stratification, or if a specific dyad design (e.g., case-mother dyads vs. case-mother/control-mother dyads) is more advantageous. In this article, we review existing statistical methods for analyzing genome-wide marker data on dyads and perform extensive simulation experiments to benchmark their type I errors and statistical power under different scenarios. We extend our evaluation to existing methods for analyzing a combination of case-parent trios and dyads together. We apply these methods on genotyped and imputed data from multiethnic mother-child pairs only, case-parent trios only or combinations of both dyads and trios from the Gene, Environment Association Studies consortium (GENEVA), where each family was ascertained through a child affected by nonsyndromic cleft lip with or without cleft palate. Results from the GENEVA study corroborate the findings from our simulation experiments. Finally, we provide recommendations for using statistical genetic association methods for dyads.
Mobility of individuals between a wide variety of geographic locations, social positions, or roles is frequently analysed in the social sciences. In recent research, mobility has increasingly been conceptualised as a ...
详细信息
Mobility of individuals between a wide variety of geographic locations, social positions, or roles is frequently analysed in the social sciences. In recent research, mobility has increasingly been conceptualised as a network. For example, residential mobility, when individuals move between neighbourhoods of a city, can be understood as a network in which neighbourhoods are nodes that are tied by counts of mobile individuals that move from one neighbourhood to another. Understanding mobility as a network allows to apply concepts and methods from the network analyst's toolbox. However, the statistical modelling of such weighted networks in which ties can have individual attributes remains difficult. In this article we propose a statistical model for the analysis of mobility tables conceptualised as networks, combining properties from log-linear models and exponential random graph models (ERGMs). When no endogenous patterns are modelled, it reduces to a classic log-linear model for mobility tables. When modelling endogenous patterns but ignoring individual attributes, the model can be understood as an ERGM for weighted networks in which tie weights denote counts. Making use of special constraints of mobility networks, the model offers a parsimonious way to deal with weighted ties. Going beyond classical ERG modelling, the proposed approach can additionally incorporate tie characteristics that represent individual attributes of mobile people. The model is applied to two cases of faculty hiring networks-linking current departments of faculty members with their PhD granting institution-in history and computer science in the US.
log-linear exponential random graph models are a specific class of statistical network models that have a log-linear representation. This class includes many stochastic blockmodel variants. We focus on ☆β☆-stochast...
详细信息
log-linear exponential random graph models are a specific class of statistical network models that have a log-linear representation. This class includes many stochastic blockmodel variants. We focus on ☆β☆-stochastic blockmodels, which combine the ☆β☆-model with a stochastic blockmodel. Here, using recent results by Almendra-Hernández, De Loera, and Petrović, which describe a Markov basis for ☆β☆-stochastic block model, we give a closed form formula for the maximum likelihood degree of a ☆β☆-stochastic blockmodel. The maximum likelihood degree is the number of complex solutions to the likelihood equations. In the case of the ☆β☆-stochastic blockmodel, the maximum likelihood degree factors into a product of Eulerian numbers.
暂无评论