In the following thesis, we investigate the modeling of time series data with multivariate discrete and especially binary structure. A model for categorical time series data with a nice interpretability which, in addi...
详细信息
In the following thesis, we investigate the modeling of time series data with multivariate discrete and especially binary structure. A model for categorical time series data with a nice interpretability which, in addition, is parsimonious, is the New Discrete AutoRe- gressive Moving Average (NDARMA) model of Jacobs and Lewis (1983). However, this model only can capture positive autocorrelation as well as positive parameters. In the first part of the thesis, we propose an extension of the NDARMA model class for the special case of binarydata, that allows for negative model parameters, and, hence, autocorrelations leading to the considerably larger and more flexible model class of generalized binary ARMA (gbARMA) processes. For this class of processes, we infer statistical properties and compare it in a simulation study with the benchmark model, the Markov Processes and other time series models. In the second part, we adopt the approach of the first part and propose a vector- valued extension of gbAR processes, that enable the joint modeling of serial and cross- sectional dependence of multivariate binary data. The resulting class of generalized binary vector Auto-Regressive (gbVAR) models is parsimonious, nicely interpretable and allows also to model negative dependence. We further extend the gbVAR model to include a moving average part, resulting in turn in the gbVARMA model. In the third and final part we pursue a further extension to vector-valued categorical time series data. For the proposed gbVARMA and NDVARMA models, we provide stationarity conditions and state the stationary solution. Stochastic properties, e.g. Yule-Walker- type equations and classical Yule-Walker equations for the pure autore- gressive case are derived. We show φ- and ψ- mixing properties of the gbVAR and NDVARMA model by proving the strict positivity of the transition probabilities. For the NDVARMA model, we discuss the identification of the distribution of the vector- valued innovation p
Since therapeutic efficacy is often measured by multiple endpoints, it will be of use if one can incorporate the information on various variables of response into procedures for testing noninferiority to improve power...
详细信息
Since therapeutic efficacy is often measured by multiple endpoints, it will be of use if one can incorporate the information on various variables of response into procedures for testing noninferiority to improve power of a univariate test procedure for each individual variable. On the basis of the proposed mixed effects logistic regression model for multivariate binary data under the matched-pairs design, we develop procedures for testing noninferiority with respect to the odds ratio in multivariate binary data under the matched-pair design. We discuss use of Bonferroni's and Scheffe's methods to control the inflation in Type I error due to multiple tests. We further employ Monte Carlo simulation to evaluate and compare the performance of these test procedures. Finally, we use the data taken from a crossover clinical trial that monitored several adverse events of an antidepressive drug to illustrate the use of test procedures derived here.
We introduce finite mixtures of Ising models as a novel approach to study multivariate patterns of associations of binary variables. Our proposed models combine the strengths of Ising models and multivariate Bernoulli...
详细信息
We introduce finite mixtures of Ising models as a novel approach to study multivariate patterns of associations of binary variables. Our proposed models combine the strengths of Ising models and multivariate Bernoulli mixture models. We examine conditions required for the local identifiability of Ising mixture models, and develop a Bayesian framework for fitting them. Through simulation experiments and real data examples, we show that Ising mixture models lead to meaningful results for sparse binary contingency tables with imbalanced cell counts. The code necessary to replicate our empirical examples is available on GitHub: https://***/Epic19mz/BayesianIsingMixtures.
We propose a multivariate Logistic Distance (MLD) model for the analysis of multiple binary responses in the presence of predictors. The MLD model can be used to simultaneously assess the dimensional/factorial structu...
详细信息
We propose a multivariate Logistic Distance (MLD) model for the analysis of multiple binary responses in the presence of predictors. The MLD model can be used to simultaneously assess the dimensional/factorial structure of the data and to study the effect of the predictor variables on each of the response variables. To enhance interpretation, the results of the proposed model can be graphically represented in a biplot, showing predictor variable axes, the categories of the response variables and the subjects' positions. The interpretation of the biplot uses a distance rule. The MLD model belongs to the family of marginal models for multivariate responses, as opposed to latent variable models and conditionally specified models. By setting the distance between the two categories of every response variable to be equal, the MLD model becomes equivalent to a marginal model for multivariate binary data estimated using a GEE method. In that case the MLD model can be fitted using existing statistical packages with a GEE procedure, e.g., the genmod procedure from SAS or the geepack package from R. Without the equality constraint, the MLD model is a general model which can be fitted by its own right. We applied the proposed model to empirical data to illustrate its advantages.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariatebinary distribution. We test the random projection method and apply an Edgeworth expansion fo...
详细信息
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariatebinary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.
In many applications, researchers collect multivariatebinary response data under two or more naturally ordered experimental conditions. In such situations, one is often interested in using all binary outcomes simulta...
详细信息
In many applications, researchers collect multivariatebinary response data under two or more naturally ordered experimental conditions. In such situations, one is often interested in using all binary outcomes simultaneously to detect an ordering among the experimental conditions. To make such comparisons, we develop a general methodology for testing for the multivariate stochastic order between K >= 2 multivariatebinary distributions. Our proposed test uses order-restricted estimators, which, according to our simulation study, are more efficient than the unrestricted estimators in terms of their mean squared error. We compared the power of the proposed test with that of several alternative tests, including procedures that combine individual univariate tests for order, such as union-intersection tests and a Bonferroni-based test. We also compared the proposed test with an unrestricted Hotel ling T-2-type test. Our simulations suggest that the proposed method competes well with these alternatives. The gain in power is often substantial. The proposed methodology is illustrated by applying it to a two-year rodent cancer bioassay data obtained from the U.S. National Toxicology Program. Supplemental materials are available online.
This article describes a generalization of the binomial distribution. The closed form probability function for the probability of k successes out of n correlated, exchangeable Bernoulli trials depends on the number of...
详细信息
This article describes a generalization of the binomial distribution. The closed form probability function for the probability of k successes out of n correlated, exchangeable Bernoulli trials depends on the number of trials and its two parameters: the common success probability and the common correlation. The distribution is derived under the assumption that the common correlation between all pairs of Bernoulli trials remains unchanged conditional on successes in all completed trials. The distribution was developed to model bond defaults but may be suited to biostatistical applications involving clusters of binarydata encountered in repeated measurements or toxicity studies of families of organisms. Maximum likelihood estimates for the parameters of the distribution are found for a set of binarydata from a developmental toxicity study on litters of mice.
The proportion ratio (PR) of responses between an experimental treatment and a control treatment is one of the most commonly used indices to measure the relative treatment effect in a randomized clinical trial. We dev...
详细信息
The proportion ratio (PR) of responses between an experimental treatment and a control treatment is one of the most commonly used indices to measure the relative treatment effect in a randomized clinical trial. We develop asymptotic and permutation-based procedures for testing equality of treatment effects as well as derive confidence intervals of PRs for multivariatebinary matched-pair data under a mixed-effects exponential risk model. To evaluate and compare the performance of these test procedures and interval estimators, we employ Monte Carlo simulation. When the number of matched pairs is large, we find that all test procedures presented here can perform well with respect to Type I error. When the number of matched pairs is small, the permutation-based test procedures developed in this paper is of use. Furthermore, using test procedures (or interval estimators) based on a weighted linear average estimator of treatment effects can improve power (or gain precision) when the treatment effects on all response variables of interest are known to fall in the same direction. Finally, we apply the data taken from a crossover clinical trial that monitored several adverse events of an antidepressive drug to illustrate the practical use of test procedures and interval estimators considered here.
A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distribution...
详细信息
A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric family for adaptive sampling on high dimensional binary spaces. A practical motivation for this problem is variable selection in a linear regression context. We want to sample from a Bayesian posterior distribution on the model space using an appropriate version of Sequential Monte Carlo. Raw versions of Sequential Monte Carlo are easily implemented using binary vectors with independent components. For high dimensional problems, however, these simple proposals do not yield satisfactory results. The key to an efficient adaptive algorithm are binary parametric families which take correlations into account, analogously to the multivariate normal distribution on continuous spaces. We provide a review of models for binarydata and make one of them work in the context of Sequential Monte Carlo sampling. Computational studies on real life data with about a hundred covariates suggest that, on difficult instances, our Sequential Monte Carlo approach clearly outperforms standard techniques based on Markov chain exploration.
Motivated by a longitudinal oral health study, the Signal-Tandmobiel (R) study, we propose a multivariatebinary inhomogeneous Markov model in which unobserved correlated response variables are subject to an unconstra...
详细信息
Motivated by a longitudinal oral health study, the Signal-Tandmobiel (R) study, we propose a multivariatebinary inhomogeneous Markov model in which unobserved correlated response variables are subject to an unconstrained misclassification process and have a monotone behavior. The multivariate baseline distributions and Markov transition matrices of the unobserved processes are defined as a function of covariates through the specification of compatible full conditional distributions. Distinct misclassification models are discussed. In all cases, the possibility that different examiners were involved in the scoring of the responses of a given subject across time is taken into account. A full Bayesian implementation of the model is described and its performance is evaluated using simulated data. We provide theoretical and empirical evidence that the parameters can be estimated without any external information about the misclassification parameters. Finally, the analyses of the motivating study are presented. Appendices 1-7 are available in the online supplementary materials.
暂无评论