In this paper, a zero-and-one-inflated Poisson (ZOIP) regression model is proposed. The maximum likelihood estimation (MLE) and Bayesian estimation for this model are investigated. Three estimation methods of the ZOIP...
详细信息
In this paper, a zero-and-one-inflated Poisson (ZOIP) regression model is proposed. The maximum likelihood estimation (MLE) and Bayesian estimation for this model are investigated. Three estimation methods of the ZOIP regression model are obtained based on data augmentation method which is expectation-maximization (em) algorithm, generalized expectation-maximization (Gem) algorithm and Gibbs sampling respectively. A simulation study is conducted to assess the performance of the proposed estimation for various sample sizes. Finally, an accidental deaths data set is analyzed to illustrate the practicability of the proposed method.
Classification of observation into several univariate normal populations is considered when the population means are unknown but equal. Plug-in Bayes classification rules based on different estimators of the common me...
详细信息
Classification of observation into several univariate normal populations is considered when the population means are unknown but equal. Plug-in Bayes classification rules based on different estimators of the common mean are proposed for k populations. When the variances are ordered, the rule based on the Graybill-Deal estimator is compared with another rule. We prove the consistency property of the classification rules. Confidence intervals of conditional error rate are derived for two and three populations. Under the assumption of ordered variances, Bayes estimator of the ratio of variances is derived to use as a plug-in estimator for classification. We derive estimators of the parameters of mixture densities associated with two normal populations with a common mean and propose classification rules for mixture distribution. An extensive simulation is performed to compare different rules and interval estimators of the conditional error rates.
In modern observational studies using electronic health records or other routinely collected data, both the outcome and covariates of interest can be error-prone and their errors often correlated. A cost-effective sol...
详细信息
In modern observational studies using electronic health records or other routinely collected data, both the outcome and covariates of interest can be error-prone and their errors often correlated. A cost-effective solution is the two-phase design, under which the error-prone outcome and covariates are observed for all subjects during the first phase and that information is used to select a validation subsample for accurate measurements of these variables in the second phase. Previous research on two-phase measurement error problems largely focused on scenarios where there are errors in covariates only or the validation sample is a simple random sample of study subjects. Herein, we propose a semiparametric approach to general two-phase measurement error problems with a quantitative outcome, allowing for correlated errors in the outcome and covariates and arbitrary second-phase selection. We devise a computationally efficient and numerically stable expectation-maximization algorithm to maximize the nonparametric likelihood function. The resulting estimators possess desired statistical properties. We demonstrate the superiority of the proposed methods over existing approaches through extensive simulation studies, and we illustrate their use in an observational HIV study.
This paper is concerned with introducing a family of multivariate mixed Negative Binomial regression models in the context of a posteriori ratemaking. The multivariate mixed Negative Binomial regression model can be c...
详细信息
This paper is concerned with introducing a family of multivariate mixed Negative Binomial regression models in the context of a posteriori ratemaking. The multivariate mixed Negative Binomial regression model can be considered as a candidate model for capturing overdispersion and positive dependencies in multi-dimensional claim count data settings, which all recent studies suggest are the norm when the ratemaking consists of pricing different types of claim counts arising from the same policy. For expository purposes, we consider the bivariate Negative Binomial-Gamma and Negative Binomial-Inverse Gaussian regression models. An Expectation-Maximization type algorithm is developed for maximum likelihood estimation of the parameters of the models for which the definition of a joint probability mass function in closed form is not feasible when the marginal means are modelled in terms of covariates. In order to illustrate the versatility of the proposed estimation procedure a numerical illustration is performed on motor insurance data on the number of claims from third party liability bodily injury and property damage. Finally, the a posteriori, or Bonus-Malus, premium rates resulting from the bivariate Negative Binomial-Gamma and Negative Binomial-Inverse Gaussian regression model are compared to those determined by the bivariate Negative Binomial and Poisson-Inverse Gaussian regression models. (C) 2021 Elsevier B.V. All rights reserved.
Gaussian mixture models (GMM) with a modulating dynamical system (DS) approach is an unsupervised learning method, and it can estimate the distribution of given data or encoding trajectories in the input space. In thi...
详细信息
Gaussian mixture models (GMM) with a modulating dynamical system (DS) approach is an unsupervised learning method, and it can estimate the distribution of given data or encoding trajectories in the input space. In this paper, a series of trajectories is considered for simulation, and the role of tuning parameters in the algorithm for both Gaussian function encoding and behavior of the dynamical system is obtained and compared. This algorithm divides the input space of the data into presupposed local regions and then in each local region of the data employs a dynamical system approach for tracking the major trajectories of the data. In this paper, the influence of the number of the Gaussian function in the GMM approach is investigated and simulated deeply. Furthermore, the influence of the local statistical characteristic of data such as mean or covariance of the data on the training process is discussed, and in these conditions, the effect of tuning parameters as the number of the Gaussian function is explained. Also, all details of the characteristic of DS depend on these tuning parameters, especially when data has more variance or noise, this adjustment should be checked more accurately. So, eventually, we showed in the obtained simulation results that the behavior and location of attractor points in DS on the data distributions and accordingly stability of the DS is getting improved drastically by tuning the number of Gaussian functions accurately.
The familywise error rate has been widely used in genome-wide association studies. With the increasing availability of functional genomics data, it is possible to increase detection power by leveraging these genomic f...
详细信息
The familywise error rate has been widely used in genome-wide association studies. With the increasing availability of functional genomics data, it is possible to increase detection power by leveraging these genomic functional annotations. Previous efforts to accommodate covariates in multiple testing focused on false discovery rate control, while covariate-adaptive procedures controlling the familywise error rate remain underdeveloped. Here, we propose a novel covariate-adaptive procedure to control the familywise error rate that incorporates external covariates which are potentially informative of either the statistical power or the prior null probability. An efficient algorithm is developed to implement the proposed method. We prove its asymptotic validity and obtain the rate of convergence through a perturbation-type argument. Our numerical studies show that the new procedure is more powerful than competing methods and maintains robustness across different settings. We apply the proposed approach to the UK Biobank data and analyse 27 traits with 9 million single-nucleotide polymorphisms tested for associations. Seventy-five genomic annotations are used as covariates. Our approach detects more genome-wide significant loci than other methods in 21 out of the 27 traits.
Skew-normal/independent distributions provide an attractive class of asymmetric heavy-tailed distributions to the usual symmetric normal distribution. We use this class of distributions here to derive a robust general...
详细信息
Skew-normal/independent distributions provide an attractive class of asymmetric heavy-tailed distributions to the usual symmetric normal distribution. We use this class of distributions here to derive a robust generalization of sinh-normal distributions (Rieck in Statistical analysis for the Birnbaum-Saunders fatigue life distribution, 1989), we then propose robust nonlinear regression models, generalizing the Birnbaum-Saunders regression models proposed by Rieck and Nedelman (Technometrics 33:51-60, 1991) that have been studied extensively. The proposed regression models have a nice hierarchical representation that facilitates easy implementation of an em algorithm for the maximum likelihood estimation of model parameters and provide a robust alternative to estimation of parameters. Simulation studies as well as applications to a real dataset are presented to illustrate the usefulness of the proposed model as well as all the inferential methods developed here.
The existence of a cured subgroup happens quite often in survival studies and many authors considered this under various situations (Farewell in Biometrics 38:1041-1046, 1982;Kuk and Chen in Biometrika 79:531-541, 199...
详细信息
The existence of a cured subgroup happens quite often in survival studies and many authors considered this under various situations (Farewell in Biometrics 38:1041-1046, 1982;Kuk and Chen in Biometrika 79:531-541, 1992;Lam and Xue in Biometrika 92:573-586, 2005;Zhou et al. in J Comput Graph Stat 27:48-58, 2018). In this paper, we discuss the situation where only interval-censored data are available and furthermore, the censoring may be informative, for which there does not seem to exist an established estimation procedure. For the analysis, we present a three component model consisting of a logistic model for describing the cure rate, an additive hazards model for the failure time of interest and a nonhomogeneous Poisson model for the observation process. For estimation, we propose a sieve maximum likelihood estimation procedure and the asymptotic properties of the resulting estimators are established. Furthermore, an em algorithm is developed for the implementation of the proposed estimation approach, and extensive simulation studies are conducted and suggest that the proposed method works well for practical situations. Also the approach is applied to a cardiac allograft vasculopathy study that motivated this investigation.
The expectation-maximization (em) algorithm is a familiar tool for computing the maximum likelihood estimate of the parameters in hidden Markov and semi-Markov models. This paper carries out a detailed study on the in...
详细信息
The expectation-maximization (em) algorithm is a familiar tool for computing the maximum likelihood estimate of the parameters in hidden Markov and semi-Markov models. This paper carries out a detailed study on the influence that the initial values of the parameters impose on the results produced by the algorithm. We compare random starts and partitional and model-based strategies for choosing the initial values for the em algorithm in the case of multivariate Gaussian emission distributions (EDs) and assess the performance of each strategy with different assessment criteria. Several data generation settings are considered with varying number of latent states, of variables as well as of the level of fuzziness in the data, and discussion on how each factor influences the obtained results is provided. Simulation results show that different initialization strategies may lead to different log-likelihood values and, accordingly, to different estimated partitions. A clear indication of which strategies should be preferred is given. We further include two real-data examples, widely analysed in the hidden semi-Markov model literature.
Cancers are routinely classified into subtypes according to various features, including histopathological characteristics and molecular markers. Previous genome-wide association studies have reported heterogeneous ass...
详细信息
Cancers are routinely classified into subtypes according to various features, including histopathological characteristics and molecular markers. Previous genome-wide association studies have reported heterogeneous associations between loci and cancer subtypes. However, it is not evident what is the optimal modeling strategy for handling correlated tumor features, missing data, and increased degrees-of-freedom in the underlying tests of associations. We propose to test for genetic associations using a mixed-effect two-stage polytomous model score test (MTOP). In the first stage, a standard polytomous model is used to specify all possible subtypes defined by the cross-classification of the tumor characteristics. In the second stage, the subtype-specific case-control odds ratios are specified using a more parsimonious model based on the case-control odds ratio for a baseline subtype, and the case-case parameters associated with tumor markers. Further, to reduce the degrees-of-freedom, we specify case-case parameters for additional exploratory markers using a random-effect model. We use the Expectation-Maximization algorithm to account for missing data on tumor markers. Through simulations across a range of realistic scenarios and data from the Polish Breast Cancer Study (PBCS), we show MTOP outperforms alternative methods for identifying heterogeneous associations between risk loci and tumor subtypes. The proposed methods have been implemented in a user-friendly and high-speed R statistical package called TOP (https://***/andrewhaoyu/TOP).
暂无评论