Hidden Markov models (HMMs) have been proposed to model the natural history of diseases while accounting for misclassification in state identification. We introduce a discrete time HMM for human papillomavirus (HPV) a...
详细信息
Hidden Markov models (HMMs) have been proposed to model the natural history of diseases while accounting for misclassification in state identification. We introduce a discrete time HMM for human papillomavirus (HPV) and cervical precancer/cancer where the hidden and observed state spaces are defined by all possible combinations of HPV, cytology, and colposcopy results. Because the population of women undergoing cervical cancer screening is heterogeneous with respect to sexual behavior, and therefore risk of HPV acquisition and subsequent precancers, we use a mover-stayer mixture model that assumes a proportion of the population will stay in the healthy state and are not subject to disease progression. As each state is a combination of three distinct tests that characterize the cervix, partially observed data arise when at least one but not every test is observed. The standard forward-backward algorithm, used for evaluating the E-step within the E-M algorithm for maximum-likelihood estimation of HMMs, cannot incorporate time points with partially observed data. We propose a new forward-backward algorithm that considers all possible fully observed states that could have occurred across a participant's follow-up visits. We apply our method to data from a large management trial for women with low-grade cervical abnormalities. Our simulation study found that our method has relatively little bias and out preforms simpler methods that resulted in larger bias.
Jump in electricity prices is often due to shock in electricity demand or shock in existing electricity supplies, which can be caused by sudden changes in temperature or production and system failure. Since jumps in e...
详细信息
Jump in electricity prices is often due to shock in electricity demand or shock in existing electricity supplies, which can be caused by sudden changes in temperature or production and system failure. Since jumps in electricity dynamics are directly related to the regime switch, we model them via the chain itself and consider a regime switching model for electricity spot price dynamic. Next, we determine an equivalent measure by Esscher transform and through it we evaluate the electricity forwards and risk premium. We apply expectation maximization algorithm to estimate parameters of the model. Furthermore, we use the real data of Nord Pool market to calibration of the proposed model. Using the characteristic function of model, we obtain a closed-form for forward contracts of Nord Pool market. Finally, we provide forward surfaces which show the months, quarters and seasons-ahead prices.
The human brain is a directional network system, in which brain regions are network nodes and the influence exerted by one region on another is a network edge. We refer to this directional information flow from one re...
详细信息
The human brain is a directional network system, in which brain regions are network nodes and the influence exerted by one region on another is a network edge. We refer to this directional information flow from one region to another as directional connectivity. Seizures arise from an epileptic directional network;abnormal neuronal activities start from a seizure onset zone and propagate via a network to otherwise healthy brain regions. As such, effective epilepsy diagnosis and treatment require accurate identification of directional connections among regions, i.e., mapping of epileptic patients'brain networks. This article aims to understand the epileptic brain network using intracranial electroencephalographic data-recordings of epileptic patients' brain activities in many regions. The most popular models for directional connectivity use ordinary differential equations (ODE). However, ODE models are sensitive to data noise and computationally costly. To address these issues, we propose a high-dimensional state-space multivariate autoregression (SSMAR) model for the brain's directional connectivity. Different from standard multivariate autoregression and SSMAR models, the proposed SSMAR features a cluster structure, where the brain network consists of several clusters of densely connected brain regions. We develop an expectation-maximization algorithm to estimate the proposed model and use it to map the interregional networks of epileptic patients in different seizure stages. Our method reveals the evolution of brain networks during seizure development.
When the distribution of the truncation time is known up to a finite-dimensional parameter vector, many researches have been conducted with the objective to improve the efficiency of estimation for nonparametric or se...
详细信息
When the distribution of the truncation time is known up to a finite-dimensional parameter vector, many researches have been conducted with the objective to improve the efficiency of estimation for nonparametric or semiparametric model with left-truncated and right-censored (LTRC) data. When the distribution of truncation times is unspecified, one approach is to use the conditional maximum likelihood estimators (cMLE) (Chen and Shen in Lifetime Data Anal , 2017). Although the cMLE has nice asymptotic properties, it is not efficient since the conditional likelihood function does not incorporate information on the distribution of truncation time. In this article, we aim to develop a more efficient estimator by considering the full likelihood function. Following Turnbull (J R Stat Soc B 38:290-295, 1976) and Qin et al. (J Am Stat Assoc 106:1434-1449, 2011), we treat the unobserved (left-truncated) subpopulation as missing data and propose a two-stage approach for obtaining the pseudo maximum likelihood estimators (PMLE) of regression parameters. In the first stage, the distribution of left truncation time is estimated by the inverse-probability-weighted (IPW) estimator (Wang in J Am Stat Assoc 86:130-143, 1991). In the second stage, we obtain the pseudo complete-data likelihood function by replacing the distribution of truncation time with the IPW estimator in the full likelihood. We propose an expectation-maximization algorithm for obtaining the PMLE and establish the consistency of the PMLE. Simulation results show that the PMLE outperforms the cMLE in terms of mean squared error. The PMLE can also be used to analyze the length-biased data, where the truncation time is uniformly distributed. We demonstrate that the PMLE works more robust against the support assumption of truncation time for length-biased data compared with the MLE proposed by Qin et al. (2011). We apply our proposed method to the channing house data. While the PMLE is quite appealing under specific c
This paper proposes an innovative statistical method to measure the impact of the class/school on student achievements in multiple subjects. We propose a semiparametric model for a bivariate response variable with ran...
详细信息
This paper proposes an innovative statistical method to measure the impact of the class/school on student achievements in multiple subjects. We propose a semiparametric model for a bivariate response variable with random coefficients, that are assumed to follow a discrete distribution with an unknown number of support points, together with an Expectation-Maximization algorithm-called BSPem algorithm-to estimate its parameters. In the case study, we apply the BSPem algorithm to data about Italian middle schools, considering students nested within classes, and we identify subpopulations of classes, standing on their effects on student achievements in reading and mathematics. The proposed model is extremely informative in exploring the correlation between multiple class effects, which are typical of the educational production function. The estimated class effects on reading and mathematics student achievements are then explained in terms of various class and school level characteristics selected by means of a LASSO regression.
Replacing the state vector of a linear state-space model by any one-to-one linear transformation does not alter maximum likelihood estimation. We extend this invariance property to more general settings, with possibly...
详细信息
Replacing the state vector of a linear state-space model by any one-to-one linear transformation does not alter maximum likelihood estimation. We extend this invariance property to more general settings, with possibly diffuse initialization of the Kalman filter and injective affine transformations of the state vector. Our results hold for both direct maximization of the likelihood function and the em algorithm. We offer two real examples that illustrate how one may employ our results to handle a variety of affine-transformed state-space models in the literature.
In this paper, we studied the estimation of R=P(X>Y)$$ R=P\left(X>Y\right) $$ based on the Burr-XII distribution under the generalized progressive hybrid censoring scheme. This censoring scheme has become quite ...
详细信息
In this paper, we studied the estimation of R=P(X>Y)$$ R=P\left(X>Y\right) $$ based on the Burr-XII distribution under the generalized progressive hybrid censoring scheme. This censoring scheme has become quite popular depending progressive hybrid censoring scheme cannot be applied when few failures occur before pre-determined time T$$ T $$. In this progressive censoring plan, amount of units withdrawn at each failure is assumed to be random and subject to the binomial distributions. Inferences of R$$ R $$ are obtained under equal shape parameters and different shape parameters, respectively. Maximum likelihood (MLE) and the Bayesian estimation methods are used. We obtain the MLEs of the parameters using Newton-Raphson (NR) and expectation maximization (em) methods, respectively. In the Bayesian section, Lindley's approximation and Markov Chain Monte Carlo (MCMC) method with Metropolis-Hasting algorithm are used. Simulation studies are used to evaluate the performance of the proposed estimators and two real-data examples are provided to exemplify the theoretical outcomes.
Clustered data are ubiquitous in a variety of scientific fields. In this article, we propose a flexible and interpretable modeling approach, called grouped heterogeneous mixture modeling, for clustered data, which mod...
详细信息
Clustered data are ubiquitous in a variety of scientific fields. In this article, we propose a flexible and interpretable modeling approach, called grouped heterogeneous mixture modeling, for clustered data, which models cluster-wise conditional distributions by mixtures of latent conditional distributions common to all the clusters. In the model, we assume that clusters are divided into a finite number of groups and mixing proportions are the same within the same group. We provide a simple generalized em algorithm for computing the maximum likelihood estimator, and an information criterion to select the numbers of groups and latent distributions. We also propose structured grouping strategies by introducing penalties on grouping parameters in the likelihood function. Under the settings where both the number of clusters and cluster sizes tend to infinity, we present asymptotic properties of the maximum likelihood estimator and the information criterion. We demonstrate the proposed method through simulation studies and an application to crime risk modeling in Tokyo.
Using Louis' formula, it is possible to obtain the observed information matrix and the corresponding large-sample standard error estimates after the expectation-maximization (em) algorithm has converged. However, ...
详细信息
Using Louis' formula, it is possible to obtain the observed information matrix and the corresponding large-sample standard error estimates after the expectation-maximization (em) algorithm has converged. However, Louis' formula is commonly de-emphasized due to its relatively complex integration representation, particularly when studying latent variable models. This paper provides a holistic overview that demonstrates how Louis' formula can be applied efficiently to item response theory (IRT) models and other popular latent variable models, such as cognitive diagnostic models (CDMs). After presenting the algebraic components required for Louis' formula, two real data analyses, with accompanying numerical illustrations, are presented. Next, a Monte Carlo simulation is presented to compare the computational efficiency of Louis' formula with previously existing methods. Results from these presentations suggest that Louis' formula should be adopted as a standard method when computing the observed information matrix for IRT models and CDMs fitted with the em algorithm due to its computational efficiency and flexibility.
The motivation of this paper came from a study which was conducted to examine the effect of laser treatment in delaying the onset of blindness in patients with diabetic retinopathy. The data are competing risks data w...
详细信息
The motivation of this paper came from a study which was conducted to examine the effect of laser treatment in delaying the onset of blindness in patients with diabetic retinopathy. The data are competing risks data with two dependent competing causes of failures, and there are ties. In this paper we have used the bivariate Weibull-geometric (BWG) distribution to analyse this data set. It is well known that the Bayesian inference has certain advantages over the classical inference in certain cases. In this paper, first we develop the Bayesian inference of the unknown parameters of the BWG model, under a fairly flexible class of priors and analyse one real data set with ties to show the effectiveness of the model. Further, it is observed that the BWG can be used to analyse dependent competing risk data quite effectively when there are ties. The analysis of the above-mentioned competing risks data set indicates that the BWG is preferred compared to the MOBW in this case.
暂无评论