The power-expected-posterior (PEP) prior provides an objective, automatic, consistent and parsimonious model selection procedure. At the same time it resolves the conceptual and computational problems due to the use o...
详细信息
The power-expected-posterior (PEP) prior provides an objective, automatic, consistent and parsimonious model selection procedure. At the same time it resolves the conceptual and computational problems due to the use of imaginary data. Namely, (i) it dispenses with the need to select and average across all possible minimal imaginary samples, and (ii) it diminishes the effect that the imaginary data have upon the posterior distribution. These attributes allow for large sample approximations, when needed, in order to reduce the computational burden under more complex models. In this work we generalize the applicability of the PEP methodology, focusing on the framework of generalized linear models (GLMs), by introducing two new PEP definitions which are in effect applicable to any general model setting. Hyper-prior extensions for the power parameter that regulates the contribution of the imaginary data are introduced. We further study the validity of the predictive matching and of the model selection consistency, providing analytical proofs for the former and empirical evidence supporting the latter. For estimation of posterior model and inclusion probabilities we introduce a tuning-free Gibbs-based variable selection sampler. Several simulation scenarios and one real life example are considered in order to evaluate the performance of the proposed methods compared to other commonly used approaches based on mixtures of g-priors. Results indicate that the GLM-PEP priors are more effective in the identification of sparse and parsimonious model formulations.
Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an ...
详细信息
Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often, these models are used successfully in practice to learn the structure of social, epidemiological, financial, or biological neural networks. However, little is known about statistical guarantees on the estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalizedlinear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modern empirical process techniques for dependent random variables. These bounds, which are supported by several simulation studies, characterize the impact of various network parameters on the estimator performance.
The R package glmm enables likelihood-based inference for generalizedlinear mixed models with a canonical link. No other publicly available software accurately conducts likelihood-based inference for generalized line...
详细信息
The R package glmm enables likelihood-based inference for generalizedlinear mixed models with a canonical link. No other publicly available software accurately conducts likelihood-based inference for generalizedlinear mixed models with crossed random effects. glmm is able to do so by approximating the likelihood function and two derivatives using importance sampling. The importance sampling distribution is an essential piece of Monte Carlo likelihood approximation, and developing a good one is the main challenge in implementing it. The package glmm uses the data to tailor the importance sampling distribution and is constructed to ensure finite Monte Carlo standard errors. In the context of the generalizedlinear mixed model, the salamander model with crossed random effects has become a benchmark example. We use this model to illustrate the complexities of the likelihood function and to demonstrate the use of the R package glmm.
Extensions of linearmodels are very commonly used in the analysis of biological data. Whereas goodness of fit measures such as the coefficient of determination (R-2) or the adjusted R-2 are well established for linea...
详细信息
Extensions of linearmodels are very commonly used in the analysis of biological data. Whereas goodness of fit measures such as the coefficient of determination (R-2) or the adjusted R-2 are well established for linearmodels, it is not obvious how such measures should be defined for generalizedlinear and mixed models. There are by now several proposals but no consensus has yet emerged as to the best unified approach in these settings. In particular, it is an open question how to best account for heteroscedasticity and for covariance among observations present in residual error or induced by random effects. This paper proposes a new approach that addresses this issue and is universally applicable for arbitrary variance-covariance structures including spatial models and repeated measures. It is exemplified using three biological examples.
Gambusia affinis (G. affinis) is an invasive fish species found in the Sundays River Valley of the Eastern Cape, South Africa, The relative abundance and population dynamics of G. affinis were quantified in five inter...
详细信息
Gambusia affinis (G. affinis) is an invasive fish species found in the Sundays River Valley of the Eastern Cape, South Africa, The relative abundance and population dynamics of G. affinis were quantified in five interconnected impoundments within the Sundays River Valley, This study utilised a G. affinis data set to demonstrate various, classical ANOVA models. generalized linear models were used to standardize catch per unit effort (CPUE) estimates and to determine environmental variables which influenced the CPUE, Based on the generalizedlinear model results dam age, mean temperature, Oreochromis mossambicus abundance and Glossogobius callidus abundance had a significant effect on the G. affinis CPUE. The Albany Angling Association collected data during fishing tag and release events. These data were utilized to demonstrate repeated measures designs. Mixed-effects models provided a powerful and flexible tool for analyzing clustered data such as repeated measures data and nested data, lienee it has become tremendously popular as a framework for the analysis of bio-behavioral experiments. The results show that the mixed-effects methods proposed in this study are more efficient than those based on generalized linear models. These data were better modeled with mixed-effects models due to their flexibility in handling missing data.
Nearly all statistical inference methods were developed for the regime where the number N of data samples is much larger than the data dimension p. Inference protocols such as maximum likelihood (ML) or maximum a post...
详细信息
Nearly all statistical inference methods were developed for the regime where the number N of data samples is much larger than the data dimension p. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if p = O(N), due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfitting errors are not just noise but take mostly the form of a bias, and how with the replica method from statistical physics one can model and predict this bias and the noise statistics. Here we extend our approach to arbitrary generalizedlinear regression models (GLM), with possibly correlated covariates. We analyse overfitting in ML/MAP inference without having to specify data types or regression models, relying only on the GLM form, and derive generic order parameter equations for the case of L2 priors. Second, we derive the probabilistic relationship between true and inferred regression coefficients in GLMs, and show that, for the relevant hyperparameter scaling and correlated covariates, the L2 regularization causes a predictable direction change of the coefficient vector. Our results, illustrated by application to linear, logistic, and Cox regression, enable one to correct ML and MAP inferences in GLMs systematically for overfitting bias, and thus extend their applicability into the hitherto forbidden regime p=O(N).
In this study, we develop and compare satellite rainfall retrievals based on generalized linear models and artificial neural networks. Both approaches are used in classification mode in a first step to identify the pr...
详细信息
In this study, we develop and compare satellite rainfall retrievals based on generalized linear models and artificial neural networks. Both approaches are used in classification mode in a first step to identify the precipitating areas (precipitation detection) and in regression mode in a second step to estimate the rainfall intensity at the ground (rain rate). The input predictors are geostationary satellite infrared (IR) brightness temperatures and Satellite Application Facility (SAF) nowcasting products which consist of cloud properties, such as cloud top height and cloud type. Additionally, a set of auxiliary location-describing input variables is employed. The output predictand is the ground-based instantaneous rain rate provided by the European-scale radar composite OPERA, that was additionally quality-controlled. We compare our results to a precipitation product which uses a single infrared (IR) channel for the rainfall retrieval. Specifically, we choose the operational PR-OBS-3 hydrology SAF product as a representative example for this type of approach. With generalized linear models, we show that we are able to substantially improve in terms of hits by considering more IR channels and cloud property predictors. Furthermore, we demonstrate the added value of using artificial neural networks to further improve prediction skill by additionally reducing false alarms. In the rain rate estimation, the indirect relationship between surface rain rates and the cloud properties measurable with geostationary satellites limit the skill of all models, which leads to smooth predictions close to the mean rainfall intensity. Probability matching is explored as a tool to recover higher order statistics to obtain a more realistic rain rate distribution.
In this article, we consider the variable selection and estimation for high-dimensional generalized linear models when the number of parameters diverges with the sample size. We propose a penalized quasi-likelihood fu...
详细信息
In this article, we consider the variable selection and estimation for high-dimensional generalized linear models when the number of parameters diverges with the sample size. We propose a penalized quasi-likelihood function with the bridge penalty. The consistency and the Oracle property of the quasi-likeiihood bridge estimators are obtained. Some simulations and a real data analysis are given to illustrate the performance of the proposed method.
In this article, we focus on the problem of robust variable selection for high-dimensional generalized linear models. The proposed procedure is based on smooth-threshold estimating equations and a bounded exponential ...
详细信息
In this article, we focus on the problem of robust variable selection for high-dimensional generalized linear models. The proposed procedure is based on smooth-threshold estimating equations and a bounded exponential score function with a tuning parameter . The outstanding merit of this new procedure is that it is robust and efficient by selecting automatically the tuning parameter based on the observed data, and its performance is superior to some recently developed methods, in particular, when many outliers are included. Furthermore, under some regularity conditions, we have shown that the resulting estimator is root n/p(n)-consistent and enjoys the oracle property, when the dimension p(n) of the predictors satisfies the condition p(n)(2)/n -> 0, where n is the sample size. Finally, Monte Carlo simulation studies and a real data example are carried out to examine the finite-sample performance of the proposed method.
In this paper, we study the asymptotic properties of the adaptive Lasso estimators in high-dimensional generalized linear models. The consistency of the adaptive Lasso estimator is obtained. We show that, if a reasona...
详细信息
In this paper, we study the asymptotic properties of the adaptive Lasso estimators in high-dimensional generalized linear models. The consistency of the adaptive Lasso estimator is obtained. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with non zero coefficients with probability converging to one, and that the estimators of non zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an Oracle property. The results are examined by some simulations and a real example.
暂无评论