Functional data have been gaining increasing popularity in the field of time series analysis. However, so far modeling heterogeneous multivariate functional time series remains a research gap. To fill it, this paper p...
详细信息
Functional data have been gaining increasing popularity in the field of time series analysis. However, so far modeling heterogeneous multivariate functional time series remains a research gap. To fill it, this paper proposes a time-varying functional state space model (TV-FSSM). It uses functional decomposition to extract features of the functional observations, where the decomposition coefficients are regarded as latent states that evolve according to a tensor autoregressive model. This two-layer structure can on the one hand efficiently extract continuous functional features, and on the other provide a flexible and generalized description of data heterogeneity among different time points. An expectation maximization (em) framework is developed for parameter estimation, where regularization and constraints are incorporated for better model interoperability. As the sample size grows, an incremental learning version of the em algorithm is given to efficiently update the model parameters. Some model properties, including model identifiability conditions, convergence issues, time complexities, and bounds of its one-step-ahead prediction errors, are also presented. Extensive experiments on both real and synthetic datasets are performed to evaluate the predictive accuracy and efficiency of the proposed framework.
Viral load (VL) in the respiratory tract is the leading proxy for assessing infectiousness potential. Understanding the dynamics of disease-related VL within the host is of great importance, as it helps to determine d...
详细信息
Viral load (VL) in the respiratory tract is the leading proxy for assessing infectiousness potential. Understanding the dynamics of disease-related VL within the host is of great importance, as it helps to determine different policies and health recommendations. However, normally the VL is measured on individuals only once, in order to confirm infection, and furthermore, the infection date is unknown. It is therefore necessary to develop statistical approaches to estimate the typical VL trajectory. We show here that, under plausible parametric assumptions, two measures of VL on infected individuals can be used to accurately estimate the VL mean function. Specifically, we consider a discrete-time likelihood-based approach to modeling and estimating partial observed longitudinal samples. We study a multivariate normal model for a function of the VL that accounts for possible correlation between measurements within individuals. We derive an expectation-maximization (em) algorithm which treats the unknown time origins and the missing measurements as latent variables. Our main motivation is the reconstruction of the daily mean VL, given measurements on patients whose VLs were measured multiple times on different days. Such data should and can be obtained at the beginning of a pandemic with the specific goal of estimating the VL dynamics. For demonstration purposes, the method is applied to SARS-Cov-2 cycle-threshold-value data collected in Israel.
In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a Generalized Linear Model, on a set...
详细信息
In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a Generalized Linear Model, on a set of explanatory variables. The vast majority of explanatory variables are partitioned into conceptually homogeneous variable groups, viewed as explanatory themes. Variables in themes are supposed many and some of them are highly correlated or even collinear. Thus, generalized linear regression demands dimension reduction and regularization with respect to each theme. Besides them, we consider a small set of "additional" covariates not conceptually linked to the themes, and demanding no regularization. Supervised Component Generalized Linear Regression proposed to both regularize and reduce the dimension of the explanatory space by searching each theme for an appropriate number of orthogonal components, which both contribute to predict the responses and capture relevant structural information in themes. In this paper, we introduce random latent variables (a.k.a. factors) so as to model the covariance matrix of the linear predictors of the responses conditional on the components. To estimate the model, we present an algorithm combining supervised component-based model estimation with factor model estimation. This methodology is tested on simulated data and then applied to an agricultural ecology dataset.
In actuarial practice, finite mixture model is one widely applied statistical method to model the insurance loss. Although the Expectation-Maximization (em) algorithm usually plays an essential tool for the parameter ...
详细信息
In actuarial practice, finite mixture model is one widely applied statistical method to model the insurance loss. Although the Expectation-Maximization (em) algorithm usually plays an essential tool for the parameter estimation of mixture models, it suffers from other issues which cause unstable predictions. For example, feature engineering and variable selection are two crucial modeling issues that are challenging for mixture models as they involve several component models. Avoiding overfitting is another technical concern of the modeling method for the prediction of future losses. To address those issues, we propose an Expectation-Boosting (EB) algorithm, which implements the gradient boosting decision trees to adaptively increase the likelihood in the second step. Our proposed EB algorithm can estimate both the mixing probabilities and the component parameters non- parametrically and overfitting-sensitively, and further perform automated feature engineering, model fitting, and variable selection simultaneously, which fully explores the predictive power of feature space. Moreover, the proposed algorithm can be combined with parallel computation methods to improve computation efficiency. Finally, we conduct two simulation studies to show the good performance of the proposed algorithm and an empirical analysis of the claim amounts for illustration.
We propose a semi-parametric spatio-temporal Hawkes process with periodic components to model the occurrence of car accidents in a given spatio-temporal window. The overall intensity is split into the sum of a backgro...
详细信息
We propose a semi-parametric spatio-temporal Hawkes process with periodic components to model the occurrence of car accidents in a given spatio-temporal window. The overall intensity is split into the sum of a background component capturing the spatio-temporal varying intensity and an excitation component accounting for the possible triggering effect between events. The spatial background is estimated and evaluated on the road network, allowing the derivation of accurate risk maps of road accidents. We constrain the spatio-temporal excitation to preserve an isotropic behaviour in space, and we generalize it to account for the effect of covariates. The estimation is pursued by maximizing the expected complete data log-likelihood using a tailored version of the stochastic-reconstruction algorithm that adopts ad hoc boundary correction strategies. An original application analyses the car accidents that occurred on the Rome road network in the years 2019, 2020, and 2021. Results highlight that car accidents of different types exhibit varying degrees of excitation, ranging from no triggering to a 10% chance of triggering further events.
To understand the patterns of times to corner kicks in soccer and how they are associated with a few important factors, we analyze the corner kick records from the 2019 regular season of the Chinese Super League. This...
详细信息
To understand the patterns of times to corner kicks in soccer and how they are associated with a few important factors, we analyze the corner kick records from the 2019 regular season of the Chinese Super League. This paper is particularly concerned with the elapsed time to a corner kick from a natural starting point. We overcome 2 challenges arising from such time-to-event analyses, which have not been discussed in the sports analytics literature. The first is that observations of times to corner kicks are subject to right-censoring. A given soccer starting point rarely ends with a corner kick but the occurrence of a different terminal event. The second issue is the mixture feature of short and typical gap times to the next corner kick from a particular one. There is often a subsequent corner kick quickly following a corner kick. The conventional event time models are thus inappropriate for formulating distributions of corner kick times. Our analysis reveals how the timing of corner kicks is associated with the factors of first versus second half of the game, home versus away team, score differential, betting odds prior to the game, and red card differential. We present applications of the developed statistical model for prediction to support tactics and sports betting.
Case-cohort studies are commonly used in various investigations, and many methods have been proposed for their analyses. However, most of the available methods are for right-censored data or assume that the censoring ...
详细信息
Case-cohort studies are commonly used in various investigations, and many methods have been proposed for their analyses. However, most of the available methods are for right-censored data or assume that the censoring is independent of the underlying failure time of interest. In addition, they usually apply only to a specific model such as the Cox model that may often be restrictive or violated in practice. To relax these assumptions, we discuss regression analysis of interval-censored data, which arise more naturally in case-cohort studies than and include right-censored data as a special case, and propose a two-step inverse probability weighting estimation procedure under a general class of semiparametric transformation models. Among other features, the approach allows for informative censoring. In addition, an em algorithm is developed for the determination of the proposed estimators and the asymptotic properties of the proposed estimators are established. Simulation results indicate that the approach works well for practical situations and it is applied to a HIV vaccine trial that motivated this investigation.
This study revisits the Dirichlet Mixture Model (DMM), offering comprehensive insights into specific facets of parameter estimation. Estimating parameters of the DMM is challenging, with previous approaches focusing o...
详细信息
This study revisits the Dirichlet Mixture Model (DMM), offering comprehensive insights into specific facets of parameter estimation. Estimating parameters of the DMM is challenging, with previous approaches focusing on standard parametrization, which lacks interpretability. We propose an alternative parametrization of the Dirichlet distribution using mean and precision, which provides critical insights into the distribution's location and peakedness. This parametrization is versatile, covering a wide range of scenarios with varying locations and precision levels, making it applicable to diverse datasets. Depending on whether one or both parameters are unknown, the estimation procedure varies, and estimates also differ when precision is identical across mixture components. In this article, we introduce this alternative parametrization and meticulously explore four distinct scenarios, deriving maximum likelihood estimates (MLE) for each using the Expectation-Maximization (em) algorithm. For high-dimensional data, where standard methods often falter due to additional challenges, we present an innovative estimation approach utilizing Stirling's approximation and moment approximation, which provides closed-form solutions and faster execution times. Our study demonstrates the identifiability of the DMM and employs a closed-form approximation for Kullback-Leibler (KL) divergence to evaluate goodness of fit. Practical applications are illustrated through the analysis of both simulated and real datasets, showcasing the practical utility of the DMM.
This article discusses the approximation of probability densities by mixtures of Gaussian densities. The Kullback-Leibler divergence is used as a measure between densities, followed by applications of the em algorithm...
详细信息
This article discusses the approximation of probability densities by mixtures of Gaussian densities. The Kullback-Leibler divergence is used as a measure between densities, followed by applications of the em algorithm. The conditions under which we study these questions are motivated by approximations introduced in non-linear Kalman-type filtering.
In this paper, we extend the unified class of Box-Cox transformation (BCT) cure rate models to accommodate interval-censored data. The probability of cure is modeled using a general covariate structure, whereas the su...
详细信息
In this paper, we extend the unified class of Box-Cox transformation (BCT) cure rate models to accommodate interval-censored data. The probability of cure is modeled using a general covariate structure, whereas the survival distribution of the uncured is modeled through a proportional hazards structure. We develop likelihood inference based on the expectation maximization (em) algorithm for the BCT cure model. Within the em framework, both simultaneous maximization and profile likelihood are addressed with respect to estimating the BCT transformation parameter. Through Monte Carlo simulations, we demonstrate the performance of the proposed estimation method through calculated bias, root mean square error, and coverage probability of the asymptotic confidence interval. Also considered is the efficacy of the proposed em algorithm as compared to direct maximization of the observed log-likelihood function. Finally, data from a smoking cessation study is analyzed for illustrative purpose.
暂无评论