While advances continue to be made in model-based clustering, challenges persist in modeling various data types such as panel data. Multivariate panel data present difficulties for clustering algorithms because they a...
详细信息
While advances continue to be made in model-based clustering, challenges persist in modeling various data types such as panel data. Multivariate panel data present difficulties for clustering algorithms because they are often plagued by missing data and dropouts, presenting issues for estimation algorithms. This research presents a family of hidden Markov models that compensate for the issues that arise in panel data. A modified expectation-maximization algorithm capable of handling missing not at random data and dropout is presented and used to perform model estimation.
A common approach to approximating Gaussian log-likelihoods at scale exploits the fact that precision matrices can be well-approximated by sparse matrices in some circumstances. This strategy is motivated by the scree...
详细信息
A common approach to approximating Gaussian log-likelihoods at scale exploits the fact that precision matrices can be well-approximated by sparse matrices in some circumstances. This strategy is motivated by the screening effect, which refers to the phenomenon in which the linear prediction of a process Z at a point x0 depends primarily on measurements nearest to x0. But simple perturbations, such as iid measurement noise, can significantly reduce the degree to which this exploitable phenomenon occurs. While strategies to cope with this issue already exist and are certainly improvements over ignoring the problem, in this work we present a new one based on the em algorithm that offers several advantages. While in this work we focus on the application to Vecchia's approximation (Vecchia), a particularly popular and powerful framework in which we can demonstrate true second-order optimization of M steps, the method can also be applied using entirely matrix-vector products, making it applicable to a very wide class of precision matrix-based approximation methods. for this article are available online.
In the literature on modeling heterogeneous data via mixture models, it is generally assumed that the samples are drawn from the underlying population using the simple random sampling (SRS) technique. This study explo...
详细信息
In the literature on modeling heterogeneous data via mixture models, it is generally assumed that the samples are drawn from the underlying population using the simple random sampling (SRS) technique. This study exploits the bivariate ranked set sampling (BVRSS) technique to learn finite mixture models. We generalize the expectation-maximization (em) algorithm under univariate RSS to the bivariate case. Computationally, through a simulation study under a noisy setting, we compare the performance of the proposed rank-based estimators with that of the SRS-based competitors in estimating unknown parameters and cluster assignments. The proposed methodology is applied to a breast cancer data set to diagnose malignant or benign tumors in patients. The results showed that the extra rank information in BVRSS samples leads to a better inference about the unknown features of mixture models.
Proportional data arise frequently in a wide variety of fields of study. Such data often exhibit extra variation such as over/under dispersion, sparseness and zero inflation. For example, the hepatitis data present bo...
详细信息
Proportional data arise frequently in a wide variety of fields of study. Such data often exhibit extra variation such as over/under dispersion, sparseness and zero inflation. For example, the hepatitis data present both sparseness and zero inflation with 19 contributing non-zero denominators of 5 or less and with 36 having zero seropositive out of 83 annual age groups. The whitefly data consists of 640 observations with 339 zeros (53%), which demonstrates extra zero inflation. The catheter management data involve excessive zeros with over 60% zeros averagely for outcomes of 193 urinary tract infections, 194 outcomes of catheter blockages and 193 outcomes of catheter displacements. However, the existing models cannot always address such features appropriately. In this paper, a new two-parameter probability distribution called Lindley-binomial (LB) distribution is proposed to analyze the proportional data with such features. The probabilistic properties of the distribution such as moment, moment generating function are derived. The Fisher scoring algorithm and em algorithm are presented for the computation of estimates of parameters in the proposed LB regression model. The issues on goodness of fit for the LB model are discussed. A limited simulation study is also performed to evaluate the performance of derived em algorithms for the estimation of parameters in the model with/without covariates. The proposed model is illustrated through three aforementioned proportional datasets.
The paper Zhao et al. (Ann Oper Res 226:727-739, 2015) shows that mean-CVaR-skewness portfolio optimization problems based on asymetric Laplace (AL) distributions can be transformed into quadratic optimization problem...
详细信息
The paper Zhao et al. (Ann Oper Res 226:727-739, 2015) shows that mean-CVaR-skewness portfolio optimization problems based on asymetric Laplace (AL) distributions can be transformed into quadratic optimization problems for which closed form solutions can be found. In this note, we show that such a result also holds for mean-risk-skewness portfolio optimization problems when the underlying distribution belongs to a larger class of normal mean-variance mixture (NMVM) models than the class of AL *** then study the value at risk (VaR) and conditional value at risk (CVaR) risk measures of portfolios of returns with NMVM *** have closed form expressions for portfolios of normal and more generally elliptically distributed returns, as discussed in Rockafellar and Uryasev (J Risk 2:21-42, 2000) and Landsman and Valdez (N Am Actuar J 7:55-71, 2003). When the returns have general NMVM distributions, these risk measures do not give closed form expressions. In this note, we give approximate closed form expressions for the VaR and CVaR of portfolios of returns with NMVM *** tests show that our closed form formulas give accurate values for VaR and CVaR and shorten the computational time for portfolio optimization problems associated with VaR and CVaR considerably.
Population size estimation has long been a key area of interest across various fields. The Schnabel census, a widely applied capture-recapture method, is commonly used for population estimation. However, the topic of ...
详细信息
Population size estimation has long been a key area of interest across various fields. The Schnabel census, a widely applied capture-recapture method, is commonly used for population estimation. However, the topic of sampling effort in Schnabel census studies remains insufficiently explored. This study aims to determine the required sampling effort in Schnabel census studies, considering different levels of capture success rates and population heterogeneity. To address this, the number of capture occasions, T, is adjusted to achieve different probabilities of missing observation, p(0), with the goal of maintaining an appropriate width of the confidence interval. Specifically, maintaining p(0)<0.5 could limit uncertainty to within 20% of the true population size for N >= 100. Zero-truncated counting distribution was applied by fitting three models: binomial, beta-binomial, and binomial mixture. The findings reveal an exponential relationship between the desired success capture rate and the required number of capture occasions. Additionally, lower detectability requires more capture occasions to achieve the same level of capture success rate compared to higher detectability. This methodological approach provides robust and efficient estimation strategies, ensuring the sustainability and feasibility of population monitoring programs.
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a n...
详细信息
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation-maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.
In this study, constant-stress accelerated life testing has been investigated using type-II censoring of failure data from a truncated normal distribution. Various classical estimation approaches are discussed for est...
详细信息
In this study, constant-stress accelerated life testing has been investigated using type-II censoring of failure data from a truncated normal distribution. Various classical estimation approaches are discussed for estimating model parameters, hazard rates, and reliability functions. Among these methods are maximum likelihood estimation, the em algorithm, and maximum product spacing estimation. Interval estimation is also introduced in the context of asymptomatic confidence intervals and bootstrap intervals. Furthermore, the missing information principle was employed to compute the observed Fisher information matrix. Three optimality criteria linked with the Fisher information matrix are considered to find out the optimal value of each stress level. To interpret the proposed techniques, Monte Carlo simulations are run in conjunction with real data analysis.
We propose a novel frailty model with change points applying random effects to a Cox proportional hazard model to adjust the heterogeneity between clusters. In the specially focused eight empowered Action Group (EAG) ...
详细信息
We propose a novel frailty model with change points applying random effects to a Cox proportional hazard model to adjust the heterogeneity between clusters. In the specially focused eight empowered Action Group (EAG) states in India, there are problems with different survival curves for children up to the age of five in different states. Therefore, when analyzing the survival times for the eight EAG states, we need to adjust for the effects among states (clusters). Because the frailty model includes random effects, the parameters are estimated using the expectation-maximization (em) algorithm. Additionally, our model needs to estimate change points;we thus propose a new algorithm extending the conventional estimation algorithm to the frailty model with change points to solve the problem. We show a practical example to demonstrate how to estimate the change point and the parameters of the distribution of random effect. Our proposed model can be easily analyzed using the existing R package. We conducted simulation studies with three scenarios to confirm the performance of our proposed model. We re-analyzed the survival time data of the eight EAG states in India to show the difference in analysis results with and without random effect. In conclusion, we confirmed that the frailty model with change points has a higher accuracy than the model without a random effect. Our proposed model is useful when heterogeneity needs to be taken into account. Additionally, the absence of heterogeneity did not affect the estimation of the regression parameters.
The discrete Pareto (DP) distribution studied in this paper is a probability model with a power-law tail, which provides a convenient alternative to the well-known Zipf distribution. While basic characteristics of the...
详细信息
The discrete Pareto (DP) distribution studied in this paper is a probability model with a power-law tail, which provides a convenient alternative to the well-known Zipf distribution. While basic characteristics of the DP model are available explicitly, this is not an exponential family and parameter estimation connected with this model is a challenging task. With this in mind we develop a computational approach to this problem, based on the expectation-maximization (em) algorithm. In the process, we discover an interesting new probability distribution, which is a certain tilted version of the standard gamma model, and we provide a short account of its basic properties. The latter play a crucial role in our em algorithm. Our computational approach to DP parameter estimation is illustrated by simulations, while a real data example from finance illustrates potential applications of the DP stochastic model.
暂无评论