The concentration of nitrogen dioxide (NO2) is worsening across the globe alongside growth in industrial and general anthropogenic activities. Due to its serious health implications with long-term exposure, studies on...
详细信息
The concentration of nitrogen dioxide (NO2) is worsening across the globe alongside growth in industrial and general anthropogenic activities. Due to its serious health implications with long-term exposure, studies on NO2 concentration have gained space in the academic literature. In this study, awareness is created on the levels of NO2 across four (4) locations within the Tema Metropolitan area, with specific interest in selecting locations and periods significantly saturated with NO2 within the study area. NO2 was measured using RKI Eagle, an instrument with a built-in sensor for a specific gas measurement. Measurements were taken day and night at sampling points around 100 meters apart in each location. Data collection was performed over a nine (9)-month period. The generalized linear model is explored for selecting locations and periods significantly affected by NO2. From the results, the fourth week (26th-31st) of July 2020, the fourth week (27th-31st) of December 2020, the first week (1st-7th) of January 2021, and the fourth week (24th-31st) of January 2021 recorded severe concentrations of NO2. Additionally, the lives of residents in the Oil Jetty and the VALVO hospital areas were found to be the most endangered, as they recorded significantly high concentrations of NO2. In a developing country such as Ghana, this study is useful for monitoring NO2 concentrations in similar areas to inform decision making and environmental policy formulation.
The Integrated Clinical and Environmental Exposures Service (ICEES) provides open regulatory-compliant access to clinical data, including electronic health record data, that have been integrated with environmental exp...
详细信息
The Integrated Clinical and Environmental Exposures Service (ICEES) provides open regulatory-compliant access to clinical data, including electronic health record data, that have been integrated with environmental exposures data. While ICEES has been validated in the context of an asthma use case and several other use cases, the regulatory constraints on the ICEES open application programming interface (OpenAPI) result in data loss when using the service for multivariate analysis. In this study, we investigated the robustness of the ICEES OpenAPI through a comparative analysis, in which we applied a generalized linear model (GLM) to the OpenAPI data and the constraint-free source data to examine factors predictive of asthma exacerbations. Consistent with previous studies, we found that the main predictors identified by both analyses were sex, prednisone, race, obesity, and airborne particulate exposure. Comparison of GLM model fit revealed that data loss impacts model quality, but only with select interaction terms. We conclude that the ICEES OpenAPI supports multivariate analysis, albeit with potential data loss that users should be aware of.
Bayesian statistical inference for generalized linear models (GLMs) with parameters lying on a constrained space is of general interest (e.g., in monotonic or convex regression), but often constructing valid prior dis...
详细信息
Bayesian statistical inference for generalized linear models (GLMs) with parameters lying on a constrained space is of general interest (e.g., in monotonic or convex regression), but often constructing valid prior distributions supported on a subspace spanned by a set of linear inequality constraints can be challenging, especially when some of the constraints might be binding leading to a lower dimensional subspace. For the general case with canonical link, it is shown that a generalized truncated multivariate normal supported on a desired subspace can be used. Moreover, it is shown that such prior distribution facilitates the construction of a general purpose product slice sampling method to obtain (approximate) samples from corresponding posterior distribution, making the inferential method computationally efficient for a wide class of GLMs with an arbitrary set of linear inequality constraints. The proposed product slice sampler is shown to be uniformly ergodic, having a geometric convergence rate under a set of mild regularity conditions satisfied by many popular GLMs (e.g., logistic and Poisson regressions with constrained coefficients). One of the primary advantages of the proposed Bayesian estimation method over classical methods is that uncertainty of parameter estimates is easily quantified by using the samples simulated from the path of the Markov Chain of the slice sampler. Numerical illustrations using simulated data sets are presented to illustrate the superiority of the proposed methods compared to some existing methods in terms of sampling bias and variances. In addition, real case studies are presented using data sets for fertilizer-crop production and estimating the SCRAM rate in nuclear power plants. (C) 2021 Elsevier B.V. All rights reserved.
This paper studies transfer learning of a high-dimensional generalized linear model with the target model as well as source data from different but possibly related models. Both known and unknown transferable domain s...
详细信息
This paper studies transfer learning of a high-dimensional generalized linear model with the target model as well as source data from different but possibly related models. Both known and unknown transferable domain settings are considered. On the one hand, an improved two-step transfer learning algorithm is proposed and the optimal rate of convergence for estimation is proved when the set of transferable domain is known. On the other hand, when the set of transferable domain is unknown, we propose a data-driven procedure for transfer learning, called Stepwise Selection algorithm, and investigate its finite-sample performance through simulations studies. Experimental results on six datasets demonstrate that the proposed method can perform better. & COPY;2023 Elsevier B.V. All rights reserved.
Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has of...
详细信息
Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has often been documented when it comes to the impact of climate change. It has become a significant concern, especially for the Malaysian health authorities, due to its rapid spread and serious effects, leading to loss of life. Several statistical models were performed to identify climatic factors associated with infectious diseases. However, because of the complex and nonlinear interactions between climate variables and disease components, modelling their relationships have become the main challenge in climate-health studies. Hence, this study proposed a generalized linear model (GLM) via Poisson and Negative Binomial to examine the effects of the climate factors on dengue incidence by considering the collinearity between variables. This study focuses on the dengue hot spots in Malaysia for the year 2014. Since there exists collinearity between climate factors, the analysis was done separately using three different models. The study revealed that rainfall, temperature, humidity, and wind speed were statistically significant with dengue incidence, and most of them shown a negative effect. Of all variables, wind speed has the most significant impact on dengue incidence. Having this kind of relationships, policymakers should formulate better plans such that precautionary steps can be taken to reduce the spread of dengue diseases.
Semi-supervised learning is devoted to using unlabeled data to improve the performance of machine learning algorithms. In this paper, we study the semi-supervised generalized linear model (GLM) in the distributed setu...
详细信息
Semi-supervised learning is devoted to using unlabeled data to improve the performance of machine learning algorithms. In this paper, we study the semi-supervised generalized linear model (GLM) in the distributed setup. In the cases of single or multiple machines containing unlabeled data, we propose two distributed semi-supervised algorithms based on the distributed approximate Newton method. When the labeled local sample size is small, our algorithms still give a consistent estimation, while fully supervised methods fail to converge. Moreover, we theoretically prove that the convergence rate is greatly improved when sufficient unlabeled data exists. Therefore, the proposed method requires much fewer rounds of communications to achieve the optimal rate than its fully-supervised counterpart. In the case of the linearmodel, we prove the rate lower bound after one round of communication, which shows that rate improvement is essential. Finally, several simulation analyses and real data studies are provided to demonstrate the effectiveness of our method.
The generalized linear model is considered in the multidimensional ease; the consistency and the asymptotic normality of the M.L. estimator are proved; the problem of the estimation of the unknown parameter under line...
详细信息
The generalized linear model is considered in the multidimensional ease; the consistency and the asymptotic normality of the M.L. estimator are proved; the problem of the estimation of the unknown parameter under linear restraint is investigated; then it is possible to justify the test of a linear hypothesis by the Wald test the, L. R. test and the Lagrange multiplier test, the statistic of which are asymptotically distributed according the χ2distribution. Finally the properties of the separability of hypotheses are extended to this model. An example in muiti-variate probit analysis is given.
In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the...
详细信息
In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the use of some statistical tests mostly before carrying out the Analysis of Variance (ANOVA) technique. Many academic researchers have published series of papers (articles) on some tests for detecting variance heterogeneity assumption in multiple linear regression models. So many comparisons on these tests have been made using various statistical techniques like biases, error rates as well as powers. Aside comparisons, modifications of some of these statistical tests for detecting variance heterogeneity have been reported in some literatures in recent years. In a multiple linear regression situation, much work has not been done on comparing some selected statistical tests for homoscedasticity assumption when linear, quadratic, square root, and exponential forms of heteroscedasticity are injected into the residuals. As a result of this fact, the present study intends to work extensively on all these areas of interest with a view to filling the gap. The paper aims at providing a comprehensive comparative analysis of asymptotic behaviour of some selected statistical tests for homoscedasticity assumption in order to hunt for the best statistical test for detecting heteroscedasticity in a multiple linear regression scenario with varying variances and levels of significance. In the literature, several tests for homoscedasticity are available but only nine: Breusch-Godfrey test, studentized Breusch-Pagan test, White’s test, Nonconstant Variance Score test, Park test, Spearman Rank, Glejser test, Goldfeld-Quandt test, Harrison-McCabe test were considered for this study;this is with a view to examining, by Monte Carlo simulations, their asymptotic behaviours. However, four different forms of heteroscedastic structures: exponential and linear (ge
The semi-parametric generalized linear model (SPGLM) proposed by Rathouz and Gao assumes that the response is from a general exponential family with unspecified reference distribution and can be applied to model the d...
详细信息
The semi-parametric generalized linear model (SPGLM) proposed by Rathouz and Gao assumes that the response is from a general exponential family with unspecified reference distribution and can be applied to model the distribution of binomial event-count data with a constant cluster size. We extend SPGLM to model response distributions of binomial data with varying cluster sizes by assuming marginal compatibility. The proposed model combines a non-parametric reference describing the within-cluster dependence structure with a parametric density ratio characterizing the between-group effect. It avoids making parametric assumptions about higher order dependence and is more parsimonious than non-parametric models. We fit the SPGLM with an expectation-maximization Newton-Raphson algorithm to the boron acid mouse data set and compare estimates with existing methods.
A model-based scheme is proposed for monitoring multiple gamma-distributed variables. The procedure is based on the deviance residual, which is a likelihood ratio statistic for detecting a mean shift when the shape pa...
详细信息
A model-based scheme is proposed for monitoring multiple gamma-distributed variables. The procedure is based on the deviance residual, which is a likelihood ratio statistic for detecting a mean shift when the shape parameter is assumed to be unchanged and the input and output variables are related in a certain manner. We discuss the distribution of this statistic and the proposed monitoring scheme. An example involving the advance rate of a drill is used to illustrate the implementation of the deviance residual monitoring scheme. Finally, a simulation study is performed to compare the average run length (ARL) performance of the proposed method to the standard Shewhart control chart for individuals. Copyright (C) 2003 John Wiley Sons, Ltd.
暂无评论