There has been increased interest in using prior information in statistical analyses. For example, in rare diseases, it can be difficult to establish treatment efficacy based solely on data from a prospective study du...
详细信息
There has been increased interest in using prior information in statistical analyses. For example, in rare diseases, it can be difficult to establish treatment efficacy based solely on data from a prospective study due to low sample sizes. To overcome this issue, an informative prior to the treatment effect may be elicited. We develop a novel extension of the conjugate prior of that enables practitioners to elicit a prior prediction for the mean response for generalized linear models, treating the prediction as random. We refer to the hierarchical prior as the hierarchical prediction prior (HPP). For independent and identically distributed settings and the normal linear model, we derive cases for which the hyperprior is a conjugate prior. We also develop an extension of the HPP in situations where summary statistics from a previous study are available. The HPP allows for discounting based on the quality of individual level predictions, and simulation results suggest that, compared to the conjugate prior and the power prior, the HPP efficiency gains (e.g., lower mean squared error) where predictions are incompatible with the data. An efficient Monte Carlo Markov chain algorithm is developed. Applications illustrate that inferences under the HPP are more robust to prior-data conflict compared to selected nonhierarchical priors.
Recent studies have shown that gut microbiome is associated with colorectal cancer (CRC) progression and anti-cancer therapy efficacy. This study aims to optimize the ridge, elastic net, and lasso regularized generali...
详细信息
Empirical likelihood in generalized linear models with multivariate responses and working covariance matrix is *** the weakest assumption on eigenvalues of Fisher’s information matrix and some other regular condition...
详细信息
Empirical likelihood in generalized linear models with multivariate responses and working covariance matrix is *** the weakest assumption on eigenvalues of Fisher’s information matrix and some other regular conditions,we prove that the non-parametric Wilk’s property still holds,that is,the empirical log-likelihood ratio at the true parameter values converges to the standard chi-square *** simulations are given to verify our theoretical result.
Spike-and-slab priors model predictors as arising from a mixture of distributions: those that should (slab) or should not (spike) remain in the model. The spike-and-slab lasso (SSL) is a mixture of double exponentials...
详细信息
Spike-and-slab priors model predictors as arising from a mixture of distributions: those that should (slab) or should not (spike) remain in the model. The spike-and-slab lasso (SSL) is a mixture of double exponentials, extending the single lasso penalty by imposing different penalties on parameters based on their inclusion probabilities. The SSL was extended to generalized linear models (GLM) for application in genetics/genomics, and can handle many highly correlated predictors of a scalar outcome, but does not incorporate these relationships into variable selection. When images/spatial data are used to model a scalar outcome, relevant parameters tend to cluster spatially, and model performance may benefit from incorporating spatial structure into variable selection. We propose to incorporate spatial information by assigning intrinsic autoregressive priors to the logit prior probabilities of inclusion, which results in more similar shrinkage penalties among spatially adjacent parameters. Using MCMC to fit Bayesian models can be computationally prohibitive for large-scale data, but we fit the model by adapting a computationally efficient coordinate-descent-based EM algorithm. A simulation study and an application to Alzheimer's Disease imaging data show that incorporating spatial information can improve model fitness. (C) 2021 Elsevier B.V. All rights reserved.
Multivariate twin and family studies are one of the most important tools to assess diseases inheritance as well as to study their genetic and environment interrelationship. The multivariate analysis of twin and family...
详细信息
Multivariate twin and family studies are one of the most important tools to assess diseases inheritance as well as to study their genetic and environment interrelationship. The multivariate analysis of twin and family data is in general based on structural equation modelling or linear mixed models that essentially decomposes sources of covariation as originally suggested by Fisher. In this paper, we propose a flexible and unified statistical modelling framework for analysing multivariate Gaussian and non-Gaussian twin and family data. The non-normality is taken into account by actually modelling the mean and variance relationship, while the covariance structure is modelled by means of a linear covariance model including the option to model the dispersion components as functions of known covariates in a regression model fashion. The marginal specification of our models allows us to extend classic models and biometric indices such as the bivariate heritability, genetic, environmental and phenotypic correlations to non-Gaussian data. We illustrate the proposed models through simulation studies and six data analyses and provide computational implementation in R through the package mglm4twin.
This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results...
详细信息
This paper discusses the asymptotic properties of the SCAD(smoothing clipped absolute deviation)penalized quasi-likelihood estimator for generalized linear models with adaptive designs,which extend the related results for independent observations to dependent *** certain conditions,the authors proved that the SCAD penalized method correctly selects covariates with nonzero coefficients with probability converging to one,and the penalized quasi-likelihood estimators of non-zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in *** is,the SCAD estimator has consistency and oracle *** last,the results are illustrated by some simulations.
Penalized empirical likelihood for generalized linear models with longitudinal data is considered. It is shown that the penalized empirical likelihood estimators have the oracle property. Also, we conclude that the as...
详细信息
Penalized empirical likelihood for generalized linear models with longitudinal data is considered. It is shown that the penalized empirical likelihood estimators have the oracle property. Also, we conclude that the asymptotic distribution of penalized empirical likelihood ratio test statistic is a chi-square distribution. The finite sample performance of the proposed method is evaluated by some simulations and a real data example.
Being able to compare Information Retrieval (IR) systems correctly is pivotal to improving their quality. Among the most popular tools for statistical significance testing, we list t-test and ANOVA that belong to the ...
详细信息
ISBN:
(纸本)9781450392365
Being able to compare Information Retrieval (IR) systems correctly is pivotal to improving their quality. Among the most popular tools for statistical significance testing, we list t-test and ANOVA that belong to the linearmodels family. Therefore, given the relevance of linearmodels for IR evaluation, a great effort has been devoted to studying how to improve them to better compare IR systems. linearmodels rely on assumptions that IR experimental observations rarely meet, e.g. about the normality of the data or the linearity itself. Even though linearmodels are, in general, resilient to violations of their assumptions, departing from them might reduce the effectiveness of the tests. Hence, we investigate the use of the generalizedlinear Model (GLM) framework, a generalization of the traditional linear modelling that relaxes assumptions about the distribution and the shape of the models. To the best of our knowledge, there has been little or no investigation on the use of GLMs for comparing IR system performance. We discuss how GLMs work and how they can be applied in the context of IR evaluation. In particular, we focus on the link function used to build GLMs, which allows for the model to have non-linear shapes. We conduct a thorough experimentation using two TREC collections and several evaluation measures. Overall, we show how the log and logit links are able to identify more and more consistent significant differences (up to 25% more with 50 topics) than the identity link used today and with a comparable, or slightly better, risk of publication bias.
Fragmentary data is becoming more and more popular in many areas which brings big chal-lenges to researchers and data *** existing methods dealing with fragmentary data consider a continuous response while in many app...
详细信息
Fragmentary data is becoming more and more popular in many areas which brings big chal-lenges to researchers and data *** existing methods dealing with fragmentary data consider a continuous response while in many applications the response variable is *** this paper,we propose a model averaging method for generalized linear models in fragmentary data *** candidate models are fitted based on different combinations of covariate availability and sample *** optimal weight is selected by minimizing the Kullback-Leibler loss in the completed cases and its asymptotic optimality is *** evidences from a simulation study and a real data analysis about Alzheimer disease are presented.
This paper explores estimating generalized linear models (GLMs) when agents are strategic and privacy-conscious. We aim to design mechanisms that encourage truthful reporting, protect privacy, and ensure outputs are c...
详细信息
This paper explores estimating generalized linear models (GLMs) when agents are strategic and privacy-conscious. We aim to design mechanisms that encourage truthful reporting, protect privacy, and ensure outputs are close to the true parameters. Initially, we address models with sub-Gaussian covariates and heavy-tailed responses with finite fourth moments, proposing a novel private, closed-form estimator. Our mechanism features: (1) o(1)-joint differential privacy with high probability;(2) o(1/n)-approximate Bayes Nash equilibrium for (1 - o(1))-fraction of agents;(3) o(1) error in parameter estimation;(4) individual rationality for (1 -o(1)) of agents;(5) o(1) payment budget. We then extend our approach to linear regression with heavy-tailed data, using an l(4)-norm shrinkage operator to propose a similar estimator and payment scheme. (c) 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
暂无评论