In natural ecosystems, the linkages between inputs of carbon from plants, soil moisture (SM) and microbial activity are central to our understanding of nutrient cycling. Predictions of microbial activities in soil are...
详细信息
In natural ecosystems, the linkages between inputs of carbon from plants, soil moisture (SM) and microbial activity are central to our understanding of nutrient cycling. Predictions of microbial activities in soil are important as they indicate the potential of the soil to support biochemical processes that are essential for the maintenance of soil fertility as well as productivity. The dehydrogenase activity (DHA) in soil provides information on microbial activities of the soil. However, estimation of DHA activity over complex terrain such as soils of the central Himalaya is not always possible due to very harsh environment and climatic conditions. In this study, the attempts were made to estimate the DHA in the soil of mid altitude central Himalaya using computational intelligence techniques. The linear and non-linear correlation results indicate that the fluctuations in SM and organic carbon (OC) in the root zone affect DHA and can be used as predictors for DHA. Therefore, the performances of support vector machines (SVMs) and generalized linear models (GLMs) were attempted for the prediction of DHA over mid altitude central Himalaya using information of SM and OC. The results showed that the SVM was giving a much better performance than GLM using SM and OC and could be promising and cost effective approach for soil DHA prediction over complex ecosystem. Our results are also of considerable scientific and practical value to the wider scientific community, given the number of practical applications and research studies in which SM and OC datasets are used.
We consider model selection in generalized linear models (GLM) for high-dimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the mode...
详细信息
We consider model selection in generalized linear models (GLM) for high-dimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the model size. We derive a general nonasymptotic upper bound for the Kullback-Leibler risk of the resulting estimators and establish the corresponding minimax lower bounds for the sparse GLM. For the properly chosen (nonlinear) penalty, the resulting penalized maximum likelihood estimator is shown to be asymptotically minimax and adaptive to the unknown sparsity. We also discuss possible extensions of the proposed approach to model selection in the GLM under additional structural constraints and aggregation.
High-dimensional data arise frequently in modern applications such as biology, chemometrics, economics, neuroscience and other scientific fields. The common features of high-dimensional data are that many of predictor...
详细信息
High-dimensional data arise frequently in modern applications such as biology, chemometrics, economics, neuroscience and other scientific fields. The common features of high-dimensional data are that many of predictors may not be significant, and there exists high correlation among predictors. generalized linear models, as the generalization of linearmodels, also suffer from the collinearity problem. In this paper, combining the nonconvex penalty and ridge regression, we propose the weighted elastic-net to deal with the variable selection of generalized linear models on high dimension and give the theoretical properties of the proposed method with a diverging number of parameters. The finite sample behavior of the proposed method is illustrated with simulation studies and a real data example.
The biotic integrity of the Guayas River basin in Ecuador is at environmental risk due to extensive anthropogenic activities. We investigated the potential impacts of hydromorphological and chemical variables on bioti...
详细信息
The biotic integrity of the Guayas River basin in Ecuador is at environmental risk due to extensive anthropogenic activities. We investigated the potential impacts of hydromorphological and chemical variables on biotic integrity using macroinvertebrate-based bioassessments. The bioassessment methods utilized included the Biological Monitoring Working Party adapted for Colombia (BMWP-Col) and the average score per taxon (ASPT), via an extensive sampling campaign that was completed throughout the river basin at 120 sampling sites. The BMWP-Col classification ranged from very bad to good, and from probable severe pollution to clean water based on the ASPT scores. generalized linear models (GLMs) and sensitivity analysis were used to relate the bioassessment index to hydromorphological and chemical variables. It was found that elevation, nitrate-N, sediment angularity, logs, presence of macrophytes, flow velocity, turbidity, bank shape, land use and chlorophyll were the key environmental variables affecting the BMWP-Col. From the analyses, it was observed that the rivers at the upstream higher elevations of the river basin were in better condition compared to lowland systems and that a higher flow velocity was linked to a better BMWP-Col score. The nitrate concentrations were very low in the entire river basin and did not relate to a negative impact on the macroinvertebrate communities. Although the results of the models provided insights into the ecosystem, cross fold model development and validation also showed that there was a level of uncertainty in the outcomes. However, the results of the models and sensitivity analysis can support water management actions to determine and focus on alterable variables, such as the land use at different elevations, monitoring of nitrate and chlorophyll concentrations, macrophyte presence, sediment transport and bank stability.
We propose a new collinearity diagnostic tool for generalized linear models. The new diagnostic tool is termed the weighted variance inflation factor (WVIF) behaving exactly the same as the traditional variance inflat...
详细信息
We propose a new collinearity diagnostic tool for generalized linear models. The new diagnostic tool is termed the weighted variance inflation factor (WVIF) behaving exactly the same as the traditional variance inflation factor in the context of regression diagnostic, given data matrix normalized. Compared to the use of condition number (CN), WVIF shows more reliable information on how severe the situation is, when data collinearity does exist. An alternative estimator, a by-product of the new diagnostic, outperforms the ridge estimator in the presence of data collinearity in both aspects of WVIF and CN. Evidences are given through analyzing various real-world numerical examples.
generalized linear models (GLM) can be considered a stochastic version of the classical Chain-Ladder (CL) method of claim reserving in nonlife insurance. In particular, the deterministic CL model is reproduced when a ...
详细信息
ISBN:
(数字)9783319415826
ISBN:
(纸本)9783319415826;9783319415819
generalized linear models (GLM) can be considered a stochastic version of the classical Chain-Ladder (CL) method of claim reserving in nonlife insurance. In particular, the deterministic CL model is reproduced when a GLM is fitted assuming over-dispersed Poisson error distribution and logarithmic link. Our aim is to propose the use of distance-based generalized linear models (DB-GLM) in the claim reserving problem. DB-GLM can be considered a generalization of the classical GLM to the distance-based analysis, because DB-GLM contains as a particular instance ordinary GLM when the Euclidean, l(2), metric is applied. Then, DB-GLM can be considered too a stochastic version of the CL claim reserving method. In DB-GLM, the only information required is a predictor distance matrix. DB-GLM can be fitted using the dbstats package for R. To estimate reserve distributions and standard errors, we propose a nonparametric bootstrap technique adequate to the distance-based regression models. We illustrate the method with a well-known actuarial dataset.
We investigate the performance of a hybrid classifier for solving a classic problem in the area of image processing. We analyse the performance of this method for a specific classification task that is detecting skin ...
详细信息
ISBN:
(纸本)9783319212067;9783319212050
We investigate the performance of a hybrid classifier for solving a classic problem in the area of image processing. We analyse the performance of this method for a specific classification task that is detecting skin regions in a picture. Our approach consists in partitioning clustering the input dataset. Then, for each cluster we apply the well-known generalized linear models in order to identify the skin and non-skin points. We evaluate the performance of our approach using several well-known metrics. Besides, we compare the reached performance with the Feed-forward Neural Networks. The reached results prove that the proposed approach is a well-alternative for solving the skin-identification problem.
In this article, we present a compressive sensing based framework for generalizedlinear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers a...
详细信息
In this article, we present a compressive sensing based framework for generalizedlinear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers and determine optimally sparse representations of noisy data from arbitrary sets of basis functions. We then extend our model to include model order reduction capabilities that can uncover inherent sparsity in regression coefficients and achieve simple, superior fits. Second, we use the mixed l(2)/l(1) norm to develop another model that can efficiently uncover block-sparsity in regression coefficients. By performing model order reduction over all independent variables and basis functions, our algorithms successfully deemphasize the effect of independent variables that become uncorrelated with dependent variables. This desirable property has various applications in real-time anomaly detection, such as faulty sensor detection and sensor jamming in wireless sensor networks. After developing our framework and inheriting a stable recovery theorem from compressive sensing theory, we present two simulation studies on sparse or block-sparse problems that demonstrate the superior performance of our algorithms with respect to (1) classic outlier-invariant regression techniques like least absolute value and iteratively reweighted least-squares and (2) classic sparse-regularized regression techniques like LASSO.
We study the variable selection and estimation for high-dimensional generalized linear models when the number of covariates may increase to infinity with the sample size. Based on the quasi-likelihood function and the...
详细信息
We study the variable selection and estimation for high-dimensional generalized linear models when the number of covariates may increase to infinity with the sample size. Based on the quasi-likelihood function and the bridge penalty, the quasi-likelihood bridge estimators are proposed. Under reasonable conditions, the consistency of the quasi-likelihood bridge estimators can be achieved. Furthermore, We show that under appropriate conditions, the quasi-likelihood bridge estimators can correctly select the nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, the quasi-likelihood bridge estimators have an oracle property. Some simulations and a real data analysis are conducted to illustrate the proposed method.
In this article, we propose an approach for incorporating continuous and discrete original outcome distributions into the usual exponential family regression models. The new approach is an extension of the works of Su...
详细信息
In this article, we propose an approach for incorporating continuous and discrete original outcome distributions into the usual exponential family regression models. The new approach is an extension of the works of Suissa (1991) and Suissa and Blais (1995), which present methods to estimate the risk of an event defined in a sample subspace of an original continuous outcome variable. Simulation studies are presented in order to illustrate the performance of the developed methodology. Real data sets are analyzed by using the proposed models.
暂无评论