Joint modelling skewness and heterogeneity is challenging in data analysis, particularly in regression analysis which allows a random probability distribution to change flexibly with covariates. This paper, based on a...
详细信息
Joint modelling skewness and heterogeneity is challenging in data analysis, particularly in regression analysis which allows a random probability distribution to change flexibly with covariates. This paper, based on a skew Laplace normal (SLN) mixture of location, scale, and skewness, introduces a new regression model which provides a flexible modelling of location, scale and skewness parameters simultaneously. The maximum likelihood (ML) estimators of all parameters of the proposed model via the expectation-maximization (em) algorithm as well as their asymptotic properties are derived. Numerical analyses via a simulation study and a real data example are used to illustrate the performance of the proposed model.
The additive hazards model is one of the most commonly used model in regression analysis of failure time data and many estimation procedures have been developed for its inference under various situations (Kalbfleisch ...
详细信息
The additive hazards model is one of the most commonly used model in regression analysis of failure time data and many estimation procedures have been developed for its inference under various situations (Kalbfleisch and Prentice (2002);Lin and Ying (1994);Sun (2006)). In this paper, we consider a situation, case K interval-censored data with informative interval censoring, that often occurs in practice such as medical follow-up studies but has not been discussed much in the literature due to the difficulties involved. For the problem, a joint model is proposed to describe the correlation between the failure time of interest and the underlying censoring or observation process and a sieve maximum likelihood approach is developed. In particular, an em algorithm is presented for the implementation of the proposed estimation procedure and the asymptotic properties of the resulting estimators are established. A simulation study is conducted to assess the finite sample performance of the proposed method and suggests that it works well for practical situations. Also the method is applied to an AIDS study that motivated this study. (C) 2019 Elsevier B.V. All rights reserved.
This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters th...
详细信息
This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data. (C) 2020 Elsevier B.V. All rights reserved.
In this paper, we present a regression model where the response variable is a count data that follows a Waring distribution. The Waring regression model allows for analysis of phenomena where the Geometric regression ...
详细信息
In this paper, we present a regression model where the response variable is a count data that follows a Waring distribution. The Waring regression model allows for analysis of phenomena where the Geometric regression model is inadequate, because the probability of success on each trial, p, is different for each individual and p has an associated distribution. Estimation is performed by maximum likelihood, through the maximization of the Q-function using em algorithm. Diagnostic measures are calculated for this model. To illustrate the results, an application to real data is presented. Some specific details are given in the Appendix of the paper.
Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduce...
详细信息
Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduced into traditional finite mixture models in the past decade, have brought forth exciting developments in their methodologies, theories, and applications. In this article, we not only provide a selective overview of the newly-developed semiparametric mixture models, but also discuss their estimation methodologies, theoretical properties if applicable, and some open questions. Recent developments are also discussed.
In this paper, we introduce a new stationary integer-valued autoregressive process of the first order with geometric marginal based on mixing Pegram and generalized binomial thinning operators. The count series of the...
详细信息
In this paper, we introduce a new stationary integer-valued autoregressive process of the first order with geometric marginal based on mixing Pegram and generalized binomial thinning operators. The count series of the process consists of dependent Bernoulli count variables. Various properties of the process are obtained, including the distribution of its innovation process. Maximum likelihood estimation by em algorithm is applied to estimate the parameters of the process and the performance of the estimates is checked by Monte Carlo simulation. We investigate applicability of the process using a real count data set and compare the process to many competitive INAR(1) models via some goodness-of-fit statistics. As a result, forecasting of the data is discussed under the proposed process.
When several types of recurrent events may arise, interest often lies in marginal modeling and studying the nature of the dependence structure. In this paper, we propose a multivariate mixed-Poisson model with the dep...
详细信息
When several types of recurrent events may arise, interest often lies in marginal modeling and studying the nature of the dependence structure. In this paper, we propose a multivariate mixed-Poisson model with the dependence between events accommodated by type-specific random effects which are associated through use of a Gaussian copula. Such models retain marginal features with a simple interpretation, reflect the heterogeneity in risk for each type of event, and provide insight into the dependence between the different types of events. Semiparametric inference is proposed based on composite likelihood to avoid high dimensional integration. An application to a study of nutritional supplements in malnourished children is given in which the goal is to evaluate the reduction in the rate of several different kinds of infection.
The contaminated Gaussian distribution represents a simple heavy-tailed elliptical generalization of the Gaussian distribution;unlike the often-considered t-distribution, it also allows for automatic detection of mild...
详细信息
The contaminated Gaussian distribution represents a simple heavy-tailed elliptical generalization of the Gaussian distribution;unlike the often-considered t-distribution, it also allows for automatic detection of mild outlying or "bad" points in the same way that observations are typically assigned to the groups in the finite mixture model context. Starting from this distribution, we propose the contaminated factor analysis model as a method for dimensionality reduction and detection of bad points in higher dimensions. A mixture of contaminated Gaussian factor analyzers (MCGFA) model follows therefrom, and extends the recently proposed mixture of contaminated Gaussian distributions to high-dimensional data. We introduce a family of 32 parsimonious models formed by introducing constraints on the covariance and contamination structures of the general MCGFA model. We outline a variant of the expectation-maximization algorithm for parameter estimation. Various implementation issues are discussed, and the novel family of models is compared to well-established approaches on both simulated and real data. (C) 2019 Elsevier Ltd. All rights reserved.
This article deals with the statistical analysis of landmark data observed at different temporal instants. Statistical analysis of dynamic shapes is a problem with significant challenges due to the difficulty in provi...
详细信息
This article deals with the statistical analysis of landmark data observed at different temporal instants. Statistical analysis of dynamic shapes is a problem with significant challenges due to the difficulty in providing a description of the shape changes over time, across subjects and over groups of subjects. There are several modeling strategies, which can be used for dynamic shape analysis. Here, we use the exact distribution theory for the shape of planar correlated Gaussian configurations and derive the induced offset-normal shape distribution. Various properties of this distribution are investigated, and some special cases discussed. This work is a natural progression of what has been proposed in Mardia and Dryden, Dryden and Mardia, Mardia and Walder, and Kume and Welling. Supplemental materials for this article are available online.
The percentile estimators have a widespread usage in the estimation of distribution parameters because of simplicity and ease of computation. In this study, we investigate the percentile method for two-component mixtu...
详细信息
The percentile estimators have a widespread usage in the estimation of distribution parameters because of simplicity and ease of computation. In this study, we investigate the percentile method for two-component mixture distribution models which are commonly used in modeling of heterogeneous univariate data sets. We have proposed percentile estimator for two-component mixture Weibull and two-component mixture Rayleigh distributions according to two different approaches. Performances of the defined percentile estimators were compared with maximum likelihood estimators using simulation. For this purpose, we used several criteria which are bias, mean squared error, mean absolute deviation, mean relative total error and running time of the algorithm. The benefits of the proposed methods have been illustrated by three different real data sets.
暂无评论