Supersaturated designs are designs in which the number of factors exceeds the run size;consequently, there are not enough degrees of freedom to estimate all the main effects. The goal here is to identify the dominant ...
详细信息
Supersaturated designs are designs in which the number of factors exceeds the run size;consequently, there are not enough degrees of freedom to estimate all the main effects. The goal here is to identify the dominant factors that constitute a small proportion of the overall set of factors, according to the assumption of effect sparsity. The analysis of such designs constitutes a challenging task and, even though many methods have been proposed in the literature assuming a normal response, only few works attempted to address the case of non-normal responses. In this paper, we propose a method for screening out the most important features in supersaturated designs assuming a Bernoulli distributed response. This new approach is based on an effective chart in Statistical Process Control, the cumulative sum control chart, combined with an information theoretic measure, and it is referred as the MIC algorithm. We judge the value of MIC through comparisons with three existing approaches suggested in the literature: the least absolute shrinkage and selection operator penalization method, and two feature selection algorithms, the Conditional Mutual Information Maximization and the minimal-redundancy-maximal-relevance. The simulation study reveals that the proposed method can be considered an advantageous method because of its extremely good performance in terms of statistical power. Copyright (c) 2017 John Wiley & Sons, Ltd.
We provide general conditions to ensure the valid Laplace approximations to the marginal likelihoods under model misspecification, and derive the Bayesian information criteria including all terms of order O-p(1). Unde...
详细信息
We provide general conditions to ensure the valid Laplace approximations to the marginal likelihoods under model misspecification, and derive the Bayesian information criteria including all terms of order O-p(1). Under conditions in theorem 1 of Lv and Liu [J. R. Statist. Soc. B, 76, (2014), 141-167] and a continuity condition for prior densities, asymptotic expansions with error terms of order o(p)(1) are derived for the log-marginal likelihoods of possibly misspecified generalized linear models. We present some numerical examples to illustrate the finite sample performance of the proposed information criteria in misspecified models.
When separate populations exhibit similar reliability as a function of multiple explanatory variables, combining them into a single population is tempting. This can simplify future predictions and reduce uncertainty a...
详细信息
When separate populations exhibit similar reliability as a function of multiple explanatory variables, combining them into a single population is tempting. This can simplify future predictions and reduce uncertainty associated with estimation. However, combining these populations may introduce bias if the underlying relationships are in fact different. The probability of agreement formally and intuitively quantifies the similarity of estimated reliability surfaces across a two-factor input space. An example from the reliability literature demonstrates the utility of the approach when deciding whether to combine two populations or to keep them as distinct. New graphical summaries provide strategies for visualizing the results.
Combining information from different populations to improve precision, simplify future predictions, or improve underlying understanding of relationships can be advantageous when considering the reliability of several ...
详细信息
Combining information from different populations to improve precision, simplify future predictions, or improve underlying understanding of relationships can be advantageous when considering the reliability of several related sets of systems. Using the probability of agreement to help quantify the similarities of populations can help to give a realistic assessment of whether the systems have reliability that are sufficiently similar for practical purposes to be treated as a homogeneous population. The new method is described and illustrated with an example involving two generations of a complex system, where the reliability is modeled using either a logistic or probit regression model. Note that supplementary materials including code, datasets, and added discussion are available online.
Rivers have been frequently assessed based on the presence of the EphemeropteraPlecopteraTrichoptera (EPT) taxa in order to determine the water quality status and develop conservation programs. This research evaluates...
详细信息
Rivers have been frequently assessed based on the presence of the EphemeropteraPlecopteraTrichoptera (EPT) taxa in order to determine the water quality status and develop conservation programs. This research evaluates the abiotic preferences of three families of the EPT taxa Baetidae, Leptoceridae and Perlidae in the Machangara River Basin located in the southern Andes of Ecuador. With this objective, using generalized linear models (GLMs), we analyzed the relation between the probability of occurrence of these pollution-sensitive macroinvertebrates families and physicochemical water quality conditions. The explanatory variables of the constructed GLMs differed substantially among the taxa, as did the preference range of the common predictors. In total, eight variables had a substantial influence on the outcomes of the three models. For choosing the best predictors of each studied taxa and for evaluation of the accuracy of its models, the Akaike information criterion (AIC) was used. The results indicated that the GLMs can be applied to predict either the presence or the absence of the invertebrate taxa and moreover, to clarify the relation to the environmental conditions of the stream. In this manner, these modeling tools can help to determine key variables for river restoration and protection management.
Peatlands are ecosystems of great relevance, because they have an important number of ecological functions that provide many services to mankind. However, studies focusing on plant diversity, addressed from the remote...
详细信息
Peatlands are ecosystems of great relevance, because they have an important number of ecological functions that provide many services to mankind. However, studies focusing on plant diversity, addressed from the remote sensing perspective, are still scarce in these environments. In the present study, predictions of vascular plant richness and diversity were performed in three anthropogenic peatlands on Chiloe Island, Chile, using free satellite data from the sensors OLI, ASTER, and MSI. Also, we compared the suitability of these sensors using two modeling methods: random forest (RF) and the generalizedlinear model (GLM). As predictors for the empirical models, we used the spectral bands, vegetation indices and textural metrics. Variable importance was estimated using recursive feature elimination (RFE). Fourteen out of the 17 predictors chosen by RFE were textural metrics, demonstrating the importance of the spatial context to predict species richness and diversity. Non-significant differences were found between the algorithms;however, the GLM models often showed slightly better results than the RF. Predictions obtained by the different satellite sensors did not show significant differences;nevertheless, the best models were obtained with ASTER (richness: R-2 = 0.62 and %RMSE = 17.2, diversity: R-2 = 0.71 and % RMSE = 20.2, obtained with RF and GLM respectively), followed by OLI and MSI. Diversity obtained higher accuracies than richness;nonetheless, accurate predictions were achieved for both, demonstrating the potential of free satellite data for the prediction of relevant community characteristics in anthropogenic peatland ecosystems.
Several thousands of chemical substances are registered every year for different purposes, and sometimes many of them are claimed to play the same role. To establish and compare their toxicities, the determination of ...
详细信息
Several thousands of chemical substances are registered every year for different purposes, and sometimes many of them are claimed to play the same role. To establish and compare their toxicities, the determination of the lethal concentrations is usually necessary and should account for natural mortality. However, many of the statistical software packages used for that purpose do not readily integrate control mortality or adjust the best link function to the data during the process. This manuscript proposes an "lc" function in the R open source that aims at the effective determination of lethal concentrations. Furthermore, it performs the procedure with the appropriate link function. The "lc" application on the example provided revealed that the complementary log link function is adequate.
The study of conflict analysis has recently become more important due to current world events. Despite numerous quantitative analyses on the study of international conflict, the statistical results are often inconsist...
详细信息
The study of conflict analysis has recently become more important due to current world events. Despite numerous quantitative analyses on the study of international conflict, the statistical results are often inconsistent with each other. The causes of conflict, however, are often stable and replicable when the prior probability of conflict is large. As there has been much conjecture about neural networks being able to cope with the complexity of such interconnected and interdependent data, we formulate a statistical version of a neural network model and compare the results to those of conventional statistical models. We then show how to apply Bayesian methods to the preferred model, with the aim of finding the posterior probabilities of conflict outbreak and hence being able to plan for conflict prevention. Journal of the Operational Research Society (2010) 61, 332-341. doi: 10.1057/jors.2008.183 Published online 4 March 2009
The simplified methods based on the cone penetration test (CPT), standard penetration test (SPT), and shear wave velocity (V-s) test are prevalent in liquefaction potential evaluation. In this study, new case historie...
详细信息
The simplified methods based on the cone penetration test (CPT), standard penetration test (SPT), and shear wave velocity (V-s) test are prevalent in liquefaction potential evaluation. In this study, new case histories with the shear wave velocity measurements and the liquefaction phenomenon observations are compiled from the 22 February 2011 Canterbury earthquakes in New Zealand. The new case histories are combined with the existing Vs database for assessing and updating probabilistic models. The widely used logistic regression models, as well as other probabilistic models, are examined in the framework of generalized linear models (GLMs). To this end, the maximum likelihood estimation (MLE) principle is used to determine the model parameters. Then, the developed generalized linear models are ranked using three model assessment criteria. Based on the assessment criteria adopted, the log-log and logistic models are recommended for both the existing and the combined database. The updated log-log model and logistic model are recommended for shear wave velocity based liquefaction potential evaluation.
In this paper, we consider the generalized linear models (GLMs) y(i) = h(X-i(T) ss) + e(i), 1 <= i <= n, where h(.) is a continuous differentiable function, e(i) are dependent errors. We obtain the M-estimator (...
详细信息
In this paper, we consider the generalized linear models (GLMs) y(i) = h(X-i(T) ss) + e(i), 1 <= i <= n, where h(.) is a continuous differentiable function, e(i) are dependent errors. We obtain the M-estimator (ss) over cap (n) of ss from the following equation: (ss) over cap (n) = arg min (ss){Sigma(n)(i-1) rho(y(i) - h(X-i(T) ss))}, where rho is assumed to be a convex function. We also show the linear representation and asymptotic normality of the estimator, which extend the correspondingly results of Wu et al. (M-estimation of linearmodels with dependent errors, Ann. Statist. 2007) to GLMs.
暂无评论