In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a generalizedlinearmodel, on a set...
详细信息
In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a generalizedlinearmodel, on a set of explanatory variables. The vast majority of explanatory variables are partitioned into conceptually homogeneous variable groups, viewed as explanatory themes. variables in themes are supposed many and some of them are highly correlated or even collinear. Thus, generalizedlinear regression demands dimension reduction and regularization with respect to each theme. Besides them, we consider a small set of "additional" covariates not conceptually linked to the themes, and demanding no regularization. Supervised Component generalizedlinear Regression proposed to both regularize and reduce the dimension of the explanatory space by searching each theme for an appropriate number of orthogonal components, which both contribute to predict the responses and capture relevant structural information in themes. In this paper, we introduce random latentvariables (a.k.a. factors) so as to model the covariance matrix of the linear predictors of the responses conditional on the components. To estimate the model, we present an algorithm combining supervised component-based model estimation with factor model estimation. This methodology is tested on simulated data and then applied to an agricultural ecology dataset.
Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of in...
详细信息
Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture model that can be viewed as an automatic way to merge the clustering performed separately on continuous and non-continuous data. This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. In this sense we generalize generalized linear latent variable models and Deep Gaussian Mixture models. We also design a new initialisation strategy and a data driven method that selects the best specification of the model and the optimal number of clusters for a given dataset. Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets.
Butterflies are considered important indicators representing the state of biodiversity and key ecosystem functions, but their use as bioindicators requires a better understanding of how their observed response is link...
详细信息
Butterflies are considered important indicators representing the state of biodiversity and key ecosystem functions, but their use as bioindicators requires a better understanding of how their observed response is linked to environmental factors. Moreover, better understanding how butterfly faunas vary with climate and land cover may be useful to estimate the potential impacts of various drivers, including climate change, botanical succession, grazing, and afforestation. It is particularly important to establish which species of butterflies are sensitive to each environmental driver. The study took place in Israel, including the West Bank and Golan Heights. To develop a robust and systematic approach for identifying how butterfly faunas vary with the environment, we analyzed the occurrence of 73 species and the abundance of 24 species from Israeli Butterfly Monitoring Scheme (BMS-IL) data. We used regional generalized additive models to quantify butterfly abundance, and generalized linear latent variable models and generalizedlinearmodels to quantify the impact of temperature, rainfall, soil type, and habitat on individual species and on the species community. Species richness was higher for cooler transects, and also for hilly and mountainous transects in the Mediterranean region (rendzina and Terra rossa soils) compared with the coastal plain (Hamra soil) and semiarid northern Jordan Vale (loessial sierozem soil). Species occurrence was better explained by temperature (negative correlation) than precipitation, while for abundance the opposite pattern was found. Soil type and habitat were insignificant drivers of occurrence and abundance. Butterfly faunas responded very strongly to temperature, even when accounting for other environmental factors. We expect that some butterfly species will disappear from marginal sites with global warming, and a large proportion will become rarer as the region becomes increasingly arid.
This article presents a generalized linear latent variable model for analyzing multivariate longitudinal data within the hidden Markov model framework. The relationships among multiple items are captured by several co...
详细信息
This article presents a generalized linear latent variable model for analyzing multivariate longitudinal data within the hidden Markov model framework. The relationships among multiple items are captured by several common latent factors. The linear coregionalization method is adopted to model the temporal processes of latentvariables. The merit of this modeling strategy lies in the fact that the processes among latentvariables are nonseparate and codependent from each other. To account for possible heterogeneity and interrelationship among the longitudinal data, a hidden Markov model is introduced to model the transition probabilities across different latent states over time. The Monte Carlo expectation conditional maximization (MCECM) algorithm is developed to estimate unknown parameters in the proposed model. The Wald- and score-type statistics are proposed to test the related dependence of processes. A simulation study is conducted to investigate the performance of the proposed methodology. An example from a longitudinal study of cocaine use is taken to illustrate the proposed methodology. (C) 2016 Elsevier Inc. All rights reserved.
latentvariablemodels for ordinal data represent a useful tool in different fields of research in which the constructs of interest are not directly observable so that one or more latentvariables are required to redu...
详细信息
latentvariablemodels for ordinal data represent a useful tool in different fields of research in which the constructs of interest are not directly observable so that one or more latentvariables are required to reduce the complexity of the data. In these cases problems related to the integration of the likelihood function of the model can arise. Indeed analytical solutions do not exist and in presence of several latentvariables the most used classical numerical approximation, the Gauss Hermite quadrature, cannot be applied since it requires several quadrature points per dimension in order to obtain quite accurate estimates and hence the computational effort becomes not feasible. Alternative solutions have been proposed in the literature, like the Laplace approximation and the adaptive quadrature. Different studies demonstrated the superiority of the latter method particularly in presence of categorical data. In this work we present a simulation study for evaluating the performance of the adaptive quadrature approximation for a general class of latentvariablemodels for ordinal data under different conditions of study. A real data example is also illustrated.
Background Surface raw water used as a source for drinking water production is a critical resource, sensitive to contamination. We conducted a study on Swedish raw water sources, aiming to identify mutually co-occurri...
详细信息
Background Surface raw water used as a source for drinking water production is a critical resource, sensitive to contamination. We conducted a study on Swedish raw water sources, aiming to identify mutually co-occurring metacommunities of bacteria, and environmental factors driving such patterns. Methods The water sources were different regarding nutrient composition, water quality, and climate characteristics, and displayed various degrees of anthropogenic impact. Water inlet samples were collected at six drinking water treatment plants over 3 years, totaling 230 samples. The bacterial communities of DNA sequenced samples (n = 175), obtained by 16S metabarcoding, were analyzed using a joint model for taxa abundance. Results Two major groups of well-defined metacommunities of microorganisms were identified, in addition to a third, less distinct, and taxonomically more diverse group. These three metacommunities showed various associations to the measured environmental data. Predictions for the well-defined metacommunities revealed differing sets of favored metabolic pathways and life strategies. In one community, taxa with methanogenic metabolism were common, while a second community was dominated by taxa with carbohydrate and lipid-focused metabolism. Conclusion The identification of ubiquitous persistent co-occurring bacterial metacommunities in freshwater habitats could potentially facilitate microbial source tracking analysis of contamination issues in freshwater sources.
暂无评论