For the analysis of caries experience in seven-year old children the association between the presence or absence of caries experience among deciduous molars within each child is explored. Some of the high associations...
详细信息
For the analysis of caries experience in seven-year old children the association between the presence or absence of caries experience among deciduous molars within each child is explored. Some of the high associations have an etiological basis (e.g., between symmetrically opponent molars), while others (diagonally opponent molars) are assumed to be the result of the transitivity of association and to disappear once conditioned on the caries experience status of the other deciduous molars, covariates and random effects. However, using discrete models for multivariate binary data, conditioning does not remove the diagonal association. When the association is explored on a latent scale, e.g., by a multivariate probit model, then conditional independence can be concluded. This contrast is confirmed when using other models on the (observed) binary scale and on the latent scale. Depending on the point of view, the differences in conditional independence might be seen as a consequence of different types of measurements or as a consequence of different models. An example shows that the results and conclusions can be markedly different with important consequences on model building. The explanation for this result is exemplified mathematically and illustrated using dental data from the Signal-Tandmobiel((R)) study. (c) 2006 Elsevier B.V. All rights reserved.
In order to understand the relevance of microbial communities on crop productivity, the identification and characterization of the rhizosphere soil microbial community is necessary. Characteristic profiles of the micr...
详细信息
In order to understand the relevance of microbial communities on crop productivity, the identification and characterization of the rhizosphere soil microbial community is necessary. Characteristic profiles of the microbial communities are obtained by denaturing gradient gel electrophoresis (DGGE) of polymerase chain reaction (PCR) amplified 16S rDNA from soil extracted DNA. These characteristic profiles, commonly called community DNA fingerprints, can be represented in the form of high-dimensional binary vectors. We address the problem of modeling and variable selection in high-dimensional multivariate binary data and present an application of our methodology in the context of a controlled agricultural experiment.
A model is proposed for multivariate binary data that incorporates positive dependence among components in a natural way. The model is derived from reliability-theoretic concepts, but is regarded as appropriate for an...
详细信息
A model is proposed for multivariate binary data that incorporates positive dependence among components in a natural way. The model is derived from reliability-theoretic concepts, but is regarded as appropriate for analysis of multivariate binary data in any field when positive dependence is an appropriate assumption. Maximum likelihood estimation by iterative solution of likelihood equations is discussed for the general model, and asymptotically efficient estimates are obtained in closed form for the fully parameterized (saturated) model. The estimation procedures are illustrated on a data set from Martin and Bradley (1972).
Since therapeutic efficacy is often measured by multiple endpoints, it will be of use if one can incorporate the information on various variables of response into procedures for testing noninferiority to improve power...
详细信息
Since therapeutic efficacy is often measured by multiple endpoints, it will be of use if one can incorporate the information on various variables of response into procedures for testing noninferiority to improve power of a univariate test procedure for each individual variable. On the basis of the proposed mixed effects logistic regression model for multivariate binary data under the matched-pairs design, we develop procedures for testing noninferiority with respect to the odds ratio in multivariate binary data under the matched-pair design. We discuss use of Bonferroni's and Scheffe's methods to control the inflation in Type I error due to multiple tests. We further employ Monte Carlo simulation to evaluate and compare the performance of these test procedures. Finally, we use the data taken from a crossover clinical trial that monitored several adverse events of an antidepressive drug to illustrate the use of test procedures derived here.
Various ordination methods for mapping n units characterized by v binary variables are in common use in which the distance between points P i and P j , representing units i and j , approximates some function (a simila...
详细信息
Various ordination methods for mapping n units characterized by v binary variables are in common use in which the distance between points P i and P j , representing units i and j , approximates some function (a similarity coefficient) of ( a ij , b ij , c ij , d ij ) , the usual cell-counts in a 2 × 2 table. Ordination generally requires (n – 1) dimensions to represent the distances exactly, but the quantities b ij - c ij can always be represented in one dimension. This leads to a simple graphical extension of ordination that helps with interpretation, reveals discrepancies, screens clustering possibilities and permits the recovery of approximations to all the ( a, b, c, d )-values. Two examples illustrate the technique.
In many applications, researchers collect multivariatebinary response data under two or more naturally ordered experimental conditions. In such situations, one is often interested in using all binary outcomes simulta...
详细信息
In many applications, researchers collect multivariatebinary response data under two or more naturally ordered experimental conditions. In such situations, one is often interested in using all binary outcomes simultaneously to detect an ordering among the experimental conditions. To make such comparisons, we develop a general methodology for testing for the multivariate stochastic order between K >= 2 multivariatebinary distributions. Our proposed test uses order-restricted estimators, which, according to our simulation study, are more efficient than the unrestricted estimators in terms of their mean squared error. We compared the power of the proposed test with that of several alternative tests, including procedures that combine individual univariate tests for order, such as union-intersection tests and a Bonferroni-based test. We also compared the proposed test with an unrestricted Hotel ling T-2-type test. Our simulations suggest that the proposed method competes well with these alternatives. The gain in power is often substantial. The proposed methodology is illustrated by applying it to a two-year rodent cancer bioassay data obtained from the U.S. National Toxicology Program. Supplemental materials are available online.
In the following thesis, we investigate the modeling of time series data with multivariate discrete and especially binary structure. A model for categorical time series data with a nice interpretability which, in addi...
详细信息
In the following thesis, we investigate the modeling of time series data with multivariate discrete and especially binary structure. A model for categorical time series data with a nice interpretability which, in addition, is parsimonious, is the New Discrete AutoRe- gressive Moving Average (NDARMA) model of Jacobs and Lewis (1983). However, this model only can capture positive autocorrelation as well as positive parameters. In the first part of the thesis, we propose an extension of the NDARMA model class for the special case of binarydata, that allows for negative model parameters, and, hence, autocorrelations leading to the considerably larger and more flexible model class of generalized binary ARMA (gbARMA) processes. For this class of processes, we infer statistical properties and compare it in a simulation study with the benchmark model, the Markov Processes and other time series models. In the second part, we adopt the approach of the first part and propose a vector- valued extension of gbAR processes, that enable the joint modeling of serial and cross- sectional dependence of multivariate binary data. The resulting class of generalized binary vector Auto-Regressive (gbVAR) models is parsimonious, nicely interpretable and allows also to model negative dependence. We further extend the gbVAR model to include a moving average part, resulting in turn in the gbVARMA model. In the third and final part we pursue a further extension to vector-valued categorical time series data. For the proposed gbVARMA and NDVARMA models, we provide stationarity conditions and state the stationary solution. Stochastic properties, e.g. Yule-Walker- type equations and classical Yule-Walker equations for the pure autore- gressive case are derived. We show φ- and ψ- mixing properties of the gbVAR and NDVARMA model by proving the strict positivity of the transition probabilities. For the NDVARMA model, we discuss the identification of the distribution of the vector- valued innovation p
We adopt boosting for classification and selection of high-dimensional binary variables for which classical methods based on normality and non singular sample dispersion are inapplicable. Boosting seems particularly w...
详细信息
We adopt boosting for classification and selection of high-dimensional binary variables for which classical methods based on normality and non singular sample dispersion are inapplicable. Boosting seems particularly well suited for binary variables. We present three methods of which two combine boosting with the relatively classical variable selection methods developed in Wilbur et al. (2002). Our primary interest is variable selection in classification with small misclassification error being used as validation of proposed method for variable selection. Two of the new methods perform uniformly better than Wilbur et al. (2002) in one set of simulated and three real life examples.
This article describes a generalization of the binomial distribution. The closed form probability function for the probability of k successes out of n correlated, exchangeable Bernoulli trials depends on the number of...
详细信息
This article describes a generalization of the binomial distribution. The closed form probability function for the probability of k successes out of n correlated, exchangeable Bernoulli trials depends on the number of trials and its two parameters: the common success probability and the common correlation. The distribution is derived under the assumption that the common correlation between all pairs of Bernoulli trials remains unchanged conditional on successes in all completed trials. The distribution was developed to model bond defaults but may be suited to biostatistical applications involving clusters of binarydata encountered in repeated measurements or toxicity studies of families of organisms. Maximum likelihood estimates for the parameters of the distribution are found for a set of binarydata from a developmental toxicity study on litters of mice.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariatebinary distribution. We test the random projection method and apply an Edgeworth expansion fo...
详细信息
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariatebinary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.
暂无评论