Large-scale generalized linear array models (GLAMs) can be challenging to fit. Computation and storage of its tensor product design matrix can be impossible due to time and memory constraints, and previously considere...
详细信息
Large-scale generalized linear array models (GLAMs) can be challenging to fit. Computation and storage of its tensor product design matrix can be impossible due to time and memory constraints, and previously considered design matrix free algorithms do not scale well with the dimension of the parameter vector. A new design matrix free algorithm is proposed for computing the penalized maximum likelihood estimate for GLAMs, which, in particular, handles nondifferentiable penalty functions. The proposed algorithm is implemented and available via the R package glamlasso. It combines several ideas-previously considered separately-to obtain sparse estimates while at the same time efficiently exploiting the GLAM structure. In this article, the convergence of the algorithm is treated and the performance of its implementation is investigated and compared to that of glmnet on simulated as well as real data. It is shown that the computation time for glamlasso scales favorably with the size of the problem when compared to glmnet. Supplementary materials, in the form of R code, data and visualizations of results, are available online.
We model monthly disease counts on an age-time grid using the two-dimensional varying-coefficient Poisson regression. Since the marginal profile of counts shows a very strong and varying annual cycle, sine and cosine ...
详细信息
We model monthly disease counts on an age-time grid using the two-dimensional varying-coefficient Poisson regression. Since the marginal profile of counts shows a very strong and varying annual cycle, sine and cosine regressors model periodicity, but their coefficients are allowed to vary smoothly over the age and time plane. The coefficient surfaces are estimated using a relatively large tensor product B-spline basis. Smoothness is tuned using difference penalties on the rows and columns of the tensor product coefficients. Heavy over-dispersion occurs, making it impossible to use Akaike's information criterion or Bayesian information criterion based on a Poisson likelihood. It is handled by selective weighting of part of the data and by the use of extended quasi-likelihood. Very efficient computation is achieved with fast array algorithms. The model is applied to monthly deaths due to respiratory diseases, for U.S. females during 1959-1998 and for ages 51-100. Copyright (c) 2008 John Wiley & Sons, Ltd.
Ecologists increasingly consider phylogenetic relatedness in both community composition and spatial arrangements in communities. Here we considered both the phylogenetic correlation between multiple species and the sp...
详细信息
Ecologists increasingly consider phylogenetic relatedness in both community composition and spatial arrangements in communities. Here we considered both the phylogenetic correlation between multiple species and the spatial correlation induced by unobserved spatial heterogeneity on multiple plots. For this analysis, we introduced phylogenetic spatial generalised linear mixed models (PSGLMMs), which are an extension of phylogenetic generalised linear mixed models (PGLMMs). We used the framework of generalised lineararraymodels to simultaneously model species and plot dimension. Such models have the potential to explain the correlation of the phylogenetic relationship of the observed species and of the spatial proximity of the plots, or both. We proposed model selection strategies based on proper scores and empirically evaluated them in a case study using bird count data. In our analysis, we focused on two special cases: the community composition model and the environmental sensitivity model. Our simulation study indicated that it might be difficult to correctly identify phylogenetic signals when the phylogenetic correlation is rather low and when studying presence-absence or count data of rare or pervasive species.
暂无评论