Recently, the selection consistency of penalized least square estimators has received a great deal of attention. For the penalized likelihood estimation with certain non-convex penalties, search space can be construct...
详细信息
Recently, the selection consistency of penalized least square estimators has received a great deal of attention. For the penalized likelihood estimation with certain non-convex penalties, search space can be constructed within which there exists a unique local minimizer that exhibits selection consistency in high-dimensional generalized linear models under certain conditions. In particular, we prove that the SCAD penalty of Fan and Li (2001) and a new modified version of the unbounded penalty of Lee and Oh (2014) can be employed to achieve such a property. These results hold even for the non-sparse cases where the number of relevant covariates increases with the sample size. Simulation studies are provided to compare the performance of SCAD penalty and the newly proposed penalty. (C) 2016 Elsevier B.V. All rights reserved.
In the health and social sciences, longitudinal data have often been analyzed without taking into account the dependence between observations of the same subject. Furthermore, consideration is rarely given to the fact...
详细信息
In the health and social sciences, longitudinal data have often been analyzed without taking into account the dependence between observations of the same subject. Furthermore, consideration is rarely given to the fact that longitudinal data may come from a non-normal distribution. In addition to describing the aims and types of longitudinal designs this paper presents three approaches based on generalized estimating equations that do take into account the lack of independence in data, as well as the type of distribution. These approaches are the marginal model (population-average model), the random effects model (subject-specific model), and the transition model (Markov model or auto-correlation model). Finally, these models are applied to empirical data by means of specific procedures included in SAS, namely GENMOD, MIXED, and GLIMMIX.
Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel ...
详细信息
Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion. (C) 2016 Elsevier Ltd. All rights reserved.
We give two prediction intervals for generalized linear models that take model selection uncertainty into account. The first is a straightforward extension of asymptotic normality results and the second includes an ex...
详细信息
We give two prediction intervals for generalized linear models that take model selection uncertainty into account. The first is a straightforward extension of asymptotic normality results and the second includes an extra optimization that improves nominal coverage for small-to-moderate samples. Both PI's are wider than would be obtained without incorporating model selection uncertainty. We compare these two PI's with three other PI's. Two are based on bootstrapping procedures and the third is based on a PI from Bayes model averaging. We argue that for general usage the optimized asymptotic normality PI's work best unless sample sizes are large in which case the PI's based only on asymptotic arguments that include model selection will be easier and equivalent. In an Appendix we extend our results to generalizedlinear Mixed models.
This article considers the problem of post-averaging inference for optimal model averaging estimators in a generalized linear model (GLM). We establish the asymptotic distributions of optimal model averaging estimator...
详细信息
This article considers the problem of post-averaging inference for optimal model averaging estimators in a generalized linear model (GLM). We establish the asymptotic distributions of optimal model averaging estimators for GLMs. The asymptotic distributions of the model averaging estimators are nonstandard, depending on the configuration of the penalty term in the weight choice criterion. We also propose a feasible simulation-based confidence interval estimator and investigate its asymptotic properties rigorously. Monte Carlo simulations verify the usefulness of our theoretical results, and the proposed methods are employed to analyze a stock car racing dataset.
This article documents the application of the Poisson inverse Gaussian (PIG) regression model for modeling motor vehicle crash data. The PIG distribution, which mixes the Poisson distribution and inverse Gaussian dist...
详细信息
This article documents the application of the Poisson inverse Gaussian (PIG) regression model for modeling motor vehicle crash data. The PIG distribution, which mixes the Poisson distribution and inverse Gaussian distribution, has the potential for modeling highly dispersed count data due to the flexibility of inverse Gaussian distribution. The objectives of this article were to evaluate the application of PIG regression model for analyzing motor vehicle crash data and compare the results with negative binomial (NB) model, especially when varying dispersion parameter is introduced. To accomplish these objectives, NB and PIG models were developed with fixed and varying dispersion parameters and compared using two data sets. The results of this study show that PIG models perform better than the NB models in terms of goodness-of-fit statistics. Moreover, the PIG model can perform as well as the NB model in capturing the variance of crash data. Lastly, PIG models demonstrate almost the same prediction performance compared to NB models. Considering the simple form of PIG model and its easiness of applications, PIG model could be used as a potential alternative to the NB model for analyzing crash data.
At the single neuron level, neural information processing involves the transformation of input stimulation into an output spike train. Here a generalized linear model (GLM) is used to reconstruct the mapping from stim...
详细信息
ISBN:
(纸本)9789881563897
At the single neuron level, neural information processing involves the transformation of input stimulation into an output spike train. Here a generalized linear model (GLM) is used to reconstruct the mapping from stimulation to firing trains of single neuron for Hudgkin-Huxley (H-H) model. Firstly, H-H model is stimulated by the white noise to generate the input-output data samples used to construct GLM. Then, the parameters of GLM are estimated according to the maximum likelihood of the spike time serial of spike trains extracted from action potential of H-H. After that, the input-output mapping of spike trains evoked by white noise for H-H is successfully reconstructed. Through comparing the inter spike interval (ISI) and Pearson's correlation coefficient, it also proves that the established GLM provides a good reproduction and prediction of the firing information for H-H. These studies provide us a new insight into coding processes and information tansfer of single neural.
modelling crash data has been an integral part of the research done in highway safety. Different tools have been suggested by researchers to analyze crash data. One such tool, which was recently proposed, is the Negat...
详细信息
modelling crash data has been an integral part of the research done in highway safety. Different tools have been suggested by researchers to analyze crash data. One such tool, which was recently proposed, is the Negative Binomial generalized Exponential (NB-GE) distribution. As the name suggests, it is a combination of Negative Binomial and generalized Exponential distribution. This distribution has three parameters and can handle over-dispersed crash data which are characterized by a large number of zeros and/or long tail. This research seeks to develop a generalized linear model (GLM) for NB-GE distribution and discuss its applications in crash data analysis. The NB-GE GLM was applied to two over-dispersed crash datasets and its performance was compared to Negative Binomial-Lindley (NB-L) and Negative Binomial (NB) models using various statistical measures. It was found that NB-GE performs almost as well as NB-L model and performs much better than the NB model. This research tried to determine the percentage of zeroes and the dispersion in the dataset where the NB-GE model is recommended over the NB model for ranking sites. Datasets were simulated for different scenarios. It was found that for high dispersion the NB-GE model performs better than the NB model when the percentage of zero counts in the dataset is greater than 80%. When dataset has lower than 80% zeroes then NB model and NB-GE model perform similarly. Hence for lower percentages NB model would be preferred as it is simpler and easier to use
Identification of a subset of patients who may be sensitive to a specific treatment is an important step towards personalized medicine. We consider the case where the effect of a treatment is assessed by longitudinal ...
详细信息
Identification of a subset of patients who may be sensitive to a specific treatment is an important step towards personalized medicine. We consider the case where the effect of a treatment is assessed by longitudinal measurements, which may be continuous or categorical, such as quality of life scores assessed over the duration of a clinical trial. We assume that multiple baseline covariates, such as age and expression levels of genes, are available, and propose a generalized single-index linear threshold model to identify the treatment-sensitive subset and assess the treatment-by-subset interaction after combining these covariates. Because the model involves an indicator function with unknown parameters, conventional procedures are difficult to apply for inferences of the parameters in the model. We define smoothed generalized estimating equations and propose an inference procedure based on these equations with an efficient spectral algorithm to find their solutions. The proposed procedure is evaluated through simulation studies and an application to the analysis of data from a randomized clinical trial in advanced pancreatic cancer.
Categorical scale data are only ordinal and defined on a finite set. Continuous scale data are only ordinal and defined on a bounded interval. Due to that character, the statistical methods for scale data ought to be ...
详细信息
Categorical scale data are only ordinal and defined on a finite set. Continuous scale data are only ordinal and defined on a bounded interval. Due to that character, the statistical methods for scale data ought to be based on orders between outcomes only and not any metric involving distance measure. For simple two-sample scale data, variants of classical rank methods are suitable. For regression type of problems, there are known good generalized linear models for separate categories for a long time. In the present article is suggested a new generalizedlinear type of model based on non parametric statistics for the whole scale. Asymptotic normality for those statistics is also shown and illustrated. Both fixed and random effects are considered.
暂无评论