Although data on the prevalence of injection drug use are an essential prerequisite for estimating the number of individuals infected with the human immunodeficiency virus (HIV), there have been few attempts to utiliz...
详细信息
Although data on the prevalence of injection drug use are an essential prerequisite for estimating the number of individuals infected with the human immunodeficiency virus (HIV), there have been few attempts to utilize statistical methods of population estimation based on multiple data sources. Data on 3,670 cases (2,866 individuals) were obtained from the HIV test register, drug treatment agencies, police records, and needle and syringe exchanges in Glasgow, Scotland, in 1990. log-linear analysis was used to model the number of individuals in each of the sources. The model incorporating dependency among the three health care agencies (HIV test, drug treatment, and needle exchange) and independence of the police sample fitted the data well, with a residual chi2 value of 2.9 (6 df). The expected value of the missing cell corresponding to absence from all four samples was 5,628, yielding an overall estimate of 8,494 injectors (95% confidence interval (CI) 7,491-9,721), for a prevalence rate of 1.35% for people aged 15-55 years in Glasgow during 1990. The high ratio of known to unknown injectors (1:2) resulted from the extensive coverage of known injectors and the relatively high level of overlap between the combined health care agency sample and the police sample. While further analysis demonstrated that the probability of appearing in the four samples varied by age and sex, heterogeneity in the population did not affect the choice of model or substantially alter the estimates for the total number of unknown injectors. A concurrent study of a community-wide sample of 503 injectors resulted in an HIV prevalence rate of 1.1% (95% CI 0.4-2.5%). The results of these studies were combined to produce a further estimate of 93 HIV-infected current injectors in Glasgow (95% CI 33-214).
Analysis of discrete data, and especially contingency table data, plays a central role in biostatistics. Traditional methods rely on approximations based on asymptotic results which are very powerful but not always ap...
详细信息
Analysis of discrete data, and especially contingency table data, plays a central role in biostatistics. Traditional methods rely on approximations based on asymptotic results which are very powerful but not always appropriate. In this article we show that efficient rerandomization methods may be developed for many commonly used models and tests: multinomial testing, specifically goodness-of-fit and max tests;and goodness-of-fit of log-linear models for contingency tables. The feasibility (complexity) of these algorithms is a function of the sufficient statistics for the models. By contrast, algorithms which require the explicit enumeration of all outcomes in the sample space are exponential in the degrees of freedom, and are usually not feasible except when sample sizes are unrealistically small. The algorithms we present are different from recently proposed methods since we show how to calculate permutation distributions of commonly used statistics rather than calculating p-values for exact tests, and we emphasize underlying probability formulas rather than implementation details.
This paper introduces and investigates the notion of a hyper Markov law, which is a probability distribution over the set of probability measures on a multivariate space that (i) is concentrated on the set of Markov p...
详细信息
This paper introduces and investigates the notion of a hyper Markov law, which is a probability distribution over the set of probability measures on a multivariate space that (i) is concentrated on the set of Markov probabilities over some decomposable graph, and (ii) satisfies certain conditional independence restrictions related to that graph. A stronger version of this hyper Markov property is also studied. Our analysis starts by reconsidering the properties of Markov probabilities, using an abstract approach which thereafter proves equally applicable to the hyper Markov case. Next, it is shown constructively that hyper Markov laws exist, that they appear as sampling distributions of maximum likelihood estimators in decomposable graphical models, and also that they form natural conjugate prior distributions for a Bayesian analysis of these models. As examples we construct a range of specific hyper Markov laws, including the hyper multinomial, hyper Dirichlet and the hyper Wishart and inverse Wishart laws. These laws occur naturally in connection with the analysis of decomposable log-linear and covariance selection models.
log-linear modeling is a discrete multivariate statistical technique that is designed specifically for analyzing data when both the independent and dependent variables are categorical or nominal. The purpose of this p...
详细信息
log-linear modeling is a discrete multivariate statistical technique that is designed specifically for analyzing data when both the independent and dependent variables are categorical or nominal. The purpose of this paper is to demonstrate the utility of this technique in personnel research. The paper (a) discusses behavioral areas of application, (b) compares log-linear modeling with chi-square and regression analysis, (c) presents the basic principles and hypotheses of log-linear modeling, and (d) shows how the technique is used.
This paper discusses models for prediction analysis. It describes prediction analysis as devised by Hildebrand, Laing, and Rosenthal (1977) and criticisms of the method. Paradoxical results are illustrated. The parado...
详细信息
This paper discusses models for prediction analysis. It describes prediction analysis as devised by Hildebrand, Laing, and Rosenthal (1977) and criticisms of the method. Paradoxical results are illustrated. The paradoxical characteristics are explained from differences between models underlying statistical testing and hypothesis formulation. The paper proposes nonstandard log-linear modeling as an alternative to Hildebrand et al.'s (1977) approach. This new approach parameterizes deviations from independence between predictors and criteria. Implications are laid out. The discussion focuses on the type of hypotheses analyzed.
Mobility tables are often used in order to study trends and changes in social structures. A class of loglinearmodels is proposed that uses indicator variables much in the way dummy variables are coded in multiple li...
详细信息
Mobility tables are often used in order to study trends and changes in social structures. A class of loglinearmodels is proposed that uses indicator variables much in the way dummy variables are coded in multiple linear regression. A virtue of these models is the relative ease with which maximum likelihood parameter estimates can be computed. Several models for mobility tables are gathered from the literature. One such mobility table is analyzed at length.
This paper examines how selected physiological performance variables, such as maximal oxygen uptake, strength and power, might best be scaled for subject differences in body size. The apparent dilemma between using ei...
详细信息
This paper examines how selected physiological performance variables, such as maximal oxygen uptake, strength and power, might best be scaled for subject differences in body size. The apparent dilemma between using either ratio standards or a linear adjustment method to scale was investigated by considering how maximal oxygen uptake (***-1), peak and mean power output (W) might best be adjusted for differences in body mass (kg). A curvilinear power function model was shown to be theoretically, physiologically and empirically superior to the linearmodels. Based on the fitted power functions, the best method of scaling maximum oxygen uptake, peak and mean power output, required these variables to be divided by body mass, recorded in the units kg2/3. Hence, the power function ratio standards (***-2/***-1) and (***-2/3) were best able to describe a wide range of subjects in terms of their physiological capacity, i.e. their ability to utilise oxygen or record power maximally, independent of body size. The simple ratio standards (***-1) and (***-1) were found to best describe the same subjects according to their performance capacities or ability to run which are highly dependent on body size. The appropriate model to explain the experimental design effects on such ratio standards was shown to be log-normal rather than normal. Simply by taking logarithms of the power function ratio standard, identical solutions for the design effects are obtained using either ANOVA or, by taking the unscaled physiological variable as the dependent variable and the body size variable as the covariate, ANCOVA methods.
Two types of model are discussed for paired comparisons of several treatments using ordinal scales such as (A B, A >> B, A >>> B), where A B), special cases of the models using logit transforms simpli...
详细信息
Two types of model are discussed for paired comparisons of several treatments using ordinal scales such as (A <<< B, A << B, A < B, A = B, A > B, A >> B, A >>> B), where A <<< B denotes strong preference for treatment B over treatment A, A << B denotes moderate preference for B, A < B denotes weak preference for B, A = B denotes no preference, and so forth. For the binary scale (A < B, A > B), special cases of the models using logit transforms simplify to the Bradley - Terry model. When the same raters compare each pair of treatments, one can allow within-rater dependence by fitting the models with constrained maximum likelihood.
It is common to observe a vector of discrete and/or continuous responses in scientific problems where the objective is to characterize the dependence of each response on explanatory variables and to account for the as...
详细信息
It is common to observe a vector of discrete and/or continuous responses in scientific problems where the objective is to characterize the dependence of each response on explanatory variables and to account for the association between the outcomes. The response vector can comprise repeated observations on one variable, as in longitudinal studies or genetic studies of families, or can include observations for different variables. This paper discusses a class of models for the marginal expectations of each response and for pairwise associations. The marginal models are contrasted with log-linear models. Two generalized estimating equation approaches are compared for parameter estimation. The first focuses on the regression parameters;the second simultaneously estimates the regression and association parameters. The robustness and efficiency of each is discussed. The methods are illustrated with analyses of two data sets from public health research.
Luce's Biased Choice Model has never had a serious competitor as a model of identification data. Even when it has provided a poor model of such data, other models have done even less well. Two alternative models a...
详细信息
Luce's Biased Choice Model has never had a serious competitor as a model of identification data. Even when it has provided a poor model of such data, other models have done even less well. Two alternative models are presented and the three are fit to a published data set. One alternative model is very much like the Biased Choice Model, differing only in the way it treats response bias. It uses an ordinal assumption about the biases and might be called the Triangular Bias (TB) model. The Guessing Mixture Model (GMM) is quite different, although it too uses the concepts of bias and similarity. It posits that the observed confusion matrix is a probability mixture of two latent matrices, the one involving only similarity, not bias, while the other involves bias, not similarity. Illustrative data, a confusion matrix based on four stimuli constructed by crossing two binary features, can be naturally described in three hierarchical ways. The most general description ignores the feature structure of the stimuli. The next description, the feature pattern model, assumes that similarity depends only on the pattern of feature differences, and the simplest special case assumes that similarity depends only on the product of similarities from each of the features. For the general description the three models are not strikingly different, with the Biased Choice Model fitting least well, followed by GMM, with TB the winner. For the independent feature form, however, the GMM model fits much better than either of the others. Indeed, the independent feature model cannot be rejected at the 10% level using GMM, even though the sample of data is large.
暂无评论