检索结果-内蒙古大学图书馆

Heterogeneous Multivariate Functional Time Series Modeling: A State Space Approach

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2024年第12期36卷 8421-8433页

作者： Liu, Peiyao Lin, Junpeng Zhang, Chen Tsinghua Univ Dept Ind Engn Beijing 100190 Peoples R China

Functional data have been gaining increasing popularity in the field of time series analysis. However, so far modeling heterogeneous multivariate functional time series remains a research gap. To fill it, this paper proposes a time-varying functional state space model (TV-FSSM). It uses functional decomposition to extract features of the functional observations, where the decomposition coefficients are regarded as latent states that evolve according to a tensor autoregressive model. This two-layer structure can on the one hand efficiently extract continuous functional features, and on the other provide a flexible and generalized description of data heterogeneity among different time points. An expectation maximization (em) framework is developed for parameter estimation, where regularization and constraints are incorporated for better model interoperability. As the sample size grows, an incremental learning version of the em algorithm is given to efficiently update the model parameters. Some model properties, including model identifiability conditions, convergence issues, time complexities, and bounds of its one-step-ahead prediction errors, are also presented. Extensive experiments on both real and synthetic datasets are performed to evaluate the predictive accuracy and efficiency of the proposed framework.

关键词： Time series analysis Autoregressive processes Tensors Vectors Feature extraction Data models Temperature measurement Predictive models Kernel Correlation em algorithm functional data heterogeneous data state space model time series analysis time-varying

来源：评论

学校读者我要写书评

暂无评论

Estimating Mean Viral Load Trajectory From Intermittent Longitudinal Data and Unknown Time Origins

引用

STATISTICS IN MEDICINE 2025年第5期44卷 e70033页

作者： Woodbridge, Yonatan Mandel, Micha Goldberg, Yair Huppert, Amit Sheba Med Ctr Gertner Inst Epidemiol & Hlth Policy Res Ramat Gan Israel Holon Inst Technol Dept Comp Sci Holon Israel Hebrew Univ Jerusalem Dept Stat & Data Sci Jerusalem Israel Technion Israel Inst Technol Fac Ind Engn & Management Haifa Israel Tel Aviv Univ Fac Med & Hlth Sci Sch Publ Hlth Dept Epidemiol & Prevent Med Tel Aviv Israel

Viral load (VL) in the respiratory tract is the leading proxy for assessing infectiousness potential. Understanding the dynamics of disease-related VL within the host is of great importance, as it helps to determine different policies and health recommendations. However, normally the VL is measured on individuals only once, in order to confirm infection, and furthermore, the infection date is unknown. It is therefore necessary to develop statistical approaches to estimate the typical VL trajectory. We show here that, under plausible parametric assumptions, two measures of VL on infected individuals can be used to accurately estimate the VL mean function. Specifically, we consider a discrete-time likelihood-based approach to modeling and estimating partial observed longitudinal samples. We study a multivariate normal model for a function of the VL that accounts for possible correlation between measurements within individuals. We derive an expectation-maximization (em) algorithm which treats the unknown time origins and the missing measurements as latent variables. Our main motivation is the reconstruction of the daily mean VL, given measurements on patients whose VLs were measured multiple times on different days. Such data should and can be obtained at the beginning of a pandemic with the specific goal of estimating the VL dynamics. For demonstration purposes, the method is applied to SARS-Cov-2 cycle-threshold-value data collected in Israel.

关键词： Ct-value em algorithm multivariate normal distribution SARS-Cov-2

来源：评论

学校读者我要写书评

暂无评论

Generalized linear model based on latent factors and supervised components

引用

COMPUTATIONAL STATISTICS 2025年第3期40卷 1475-1516页

作者： Gibaud, Julien Bry, Xavier Trottier, Catherine Univ Montpellier CNRS IMAG Montpellier France Univ Paul Valery Montpellier 3 AMIS F-34000 Montpellier France

In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a Generalized Linear Model, on a set of explanatory variables. The vast majority of explanatory variables are partitioned into conceptually homogeneous variable groups, viewed as explanatory themes. Variables in themes are supposed many and some of them are highly correlated or even collinear. Thus, generalized linear regression demands dimension reduction and regularization with respect to each theme. Besides them, we consider a small set of "additional" covariates not conceptually linked to the themes, and demanding no regularization. Supervised Component Generalized Linear Regression proposed to both regularize and reduce the dimension of the explanatory space by searching each theme for an appropriate number of orthogonal components, which both contribute to predict the responses and capture relevant structural information in themes. In this paper, we introduce random latent variables (a.k.a. factors) so as to model the covariance matrix of the linear predictors of the responses conditional on the components. To estimate the model, we present an algorithm combining supervised component-based model estimation with factor model estimation. This methodology is tested on simulated data and then applied to an agricultural ecology dataset.

关键词： em algorithm Factor model Generalized linear latent variable model Multivariate generalized linear model Supervised components

来源：评论

学校读者我要写书评

暂无评论

Insurance loss modeling with gradient tree-boosted mixture models

引用

INSURANCE MATHemATICS & ECONOMICS 2025年 121卷 45-62页

作者： Hou, Yanxi Li, Jiahong Gao, Guangyuan Fudan Univ Sch Data Sci Shanghai 200433 Peoples R China Peking Univ Sch Math Sci Beijing 100871 Peoples R China Renmin Univ China Ctr Appl Stat Sch Stat Beijing 100872 Peoples R China

In actuarial practice, finite mixture model is one widely applied statistical method to model the insurance loss. Although the Expectation-Maximization (em) algorithm usually plays an essential tool for the parameter estimation of mixture models, it suffers from other issues which cause unstable predictions. For example, feature engineering and variable selection are two crucial modeling issues that are challenging for mixture models as they involve several component models. Avoiding overfitting is another technical concern of the modeling method for the prediction of future losses. To address those issues, we propose an Expectation-Boosting (EB) algorithm, which implements the gradient boosting decision trees to adaptively increase the likelihood in the second step. Our proposed EB algorithm can estimate both the mixing probabilities and the component parameters non- parametrically and overfitting-sensitively, and further perform automated feature engineering, model fitting, and variable selection simultaneously, which fully explores the predictive power of feature space. Moreover, the proposed algorithm can be combined with parallel computation methods to improve computation efficiency. Finally, we conduct two simulation studies to show the good performance of the proposed algorithm and an empirical analysis of the claim amounts for illustration.

关键词： Finite mixture models Gradient boosting em algorithm Insurance loss

来源：评论

学校读者我要写书评

暂无评论

Semi-parametric Spatio-Temporal Hawkes Process for Modelling Road Accidents in Rome

引用

JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2025年第1期30卷 8-38页

作者： Di Loro, Pierfrancesco Alaimo Mingione, Marco Fantozzi, Paolo Dipartimento Giurisprudenza Econ Polit & Lingue Mo LUMSA Rome Italy Univ Roma Tre Dipartimento Sci Polit Rome Italy

We propose a semi-parametric spatio-temporal Hawkes process with periodic components to model the occurrence of car accidents in a given spatio-temporal window. The overall intensity is split into the sum of a background component capturing the spatio-temporal varying intensity and an excitation component accounting for the possible triggering effect between events. The spatial background is estimated and evaluated on the road network, allowing the derivation of accurate risk maps of road accidents. We constrain the spatio-temporal excitation to preserve an isotropic behaviour in space, and we generalize it to account for the effect of covariates. The estimation is pursued by maximizing the expected complete data log-likelihood using a tailored version of the stochastic-reconstruction algorithm that adopts ad hoc boundary correction strategies. An original application analyses the car accidents that occurred on the Rome road network in the years 2019, 2020, and 2021. Results highlight that car accidents of different types exhibit varying degrees of excitation, ranging from no triggering to a 10% chance of triggering further events.

关键词： Hawkes process Road accidents Spatio-temporal Kernel estimation Point process em algorithm

来源：评论

学校读者我要写书评

暂无评论

On the time of corner kicks in soccer: an analysis of event history data

引用

COMPUTATIONAL STATISTICS 2025年第4期40卷 2067-2083页

作者： Peng, K. Ken Hu, X. Joan Swartz, Tim B. Simon Fraser Univ Dept Stat & Actuarial Sci 8888 Univ Dr Burnaby BC V5A 1S6 Canada

To understand the patterns of times to corner kicks in soccer and how they are associated with a few important factors, we analyze the corner kick records from the 2019 regular season of the Chinese Super League. This paper is particularly concerned with the elapsed time to a corner kick from a natural starting point. We overcome 2 challenges arising from such time-to-event analyses, which have not been discussed in the sports analytics literature. The first is that observations of times to corner kicks are subject to right-censoring. A given soccer starting point rarely ends with a corner kick but the occurrence of a different terminal event. The second issue is the mixture feature of short and typical gap times to the next corner kick from a particular one. There is often a subsequent corner kick quickly following a corner kick. The conventional event time models are thus inappropriate for formulating distributions of corner kick times. Our analysis reveals how the timing of corner kicks is associated with the factors of first versus second half of the game, home versus away team, score differential, betting odds prior to the game, and red card differential. We present applications of the developed statistical model for prediction to support tactics and sports betting.

关键词： em algorithm Factor effects Mixture distributions Predictive model Right-censored event times

来源：评论

学校读者我要写书评

暂无评论

Analysis of Informatively Interval-Censored Case-Cohort Studies with the Application to HIV Vaccine Trials

引用

COMMUNICATIONS IN MATHemATICS AND STATISTICS 2025年第1期13卷 195-215页

作者： Du, Mingyue Zhou, Qingning Jilin Univ Sch Math Changchun 130012 Peoples R China Univ North Carolina Charlotte Dept Math & Stat Charlotte NC USA

Case-cohort studies are commonly used in various investigations, and many methods have been proposed for their analyses. However, most of the available methods are for right-censored data or assume that the censoring is independent of the underlying failure time of interest. In addition, they usually apply only to a specific model such as the Cox model that may often be restrictive or violated in practice. To relax these assumptions, we discuss regression analysis of interval-censored data, which arise more naturally in case-cohort studies than and include right-censored data as a special case, and propose a two-step inverse probability weighting estimation procedure under a general class of semiparametric transformation models. Among other features, the approach allows for informative censoring. In addition, an em algorithm is developed for the determination of the proposed estimators and the asymptotic properties of the proposed estimators are established. Simulation results indicate that the approach works well for practical situations and it is applied to a HIV vaccine trial that motivated this investigation.

关键词： em algorithm Informative censoring Inverse probability weighting Joint modeling Transformation model

来源：评论

学校读者我要写书评

暂无评论

Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications

引用

STATISTICAL PAPERS 2025年第1期66卷 1-38页

作者： Pal, Samyajoy Heumann, Christian Ludwig Maximilians Univ Munchen Dept Stat Ludwigstr 33 D-80539 Munich Bavaria Germany

This study revisits the Dirichlet Mixture Model (DMM), offering comprehensive insights into specific facets of parameter estimation. Estimating parameters of the DMM is challenging, with previous approaches focusing on standard parametrization, which lacks interpretability. We propose an alternative parametrization of the Dirichlet distribution using mean and precision, which provides critical insights into the distribution's location and peakedness. This parametrization is versatile, covering a wide range of scenarios with varying locations and precision levels, making it applicable to diverse datasets. Depending on whether one or both parameters are unknown, the estimation procedure varies, and estimates also differ when precision is identical across mixture components. In this article, we introduce this alternative parametrization and meticulously explore four distinct scenarios, deriving maximum likelihood estimates (MLE) for each using the Expectation-Maximization (em) algorithm. For high-dimensional data, where standard methods often falter due to additional challenges, we present an innovative estimation approach utilizing Stirling's approximation and moment approximation, which provides closed-form solutions and faster execution times. Our study demonstrates the identifiability of the DMM and employs a closed-form approximation for Kullback-Leibler (KL) divergence to evaluate goodness of fit. Practical applications are illustrated through the analysis of both simulated and real datasets, showcasing the practical utility of the DMM.

关键词： Dirichlet Mixture Model em algorithm KL divergence Identifiability High-dimensional data

来源：评论

学校读者我要写书评

暂无评论

Mixtures of multivariate Gaussians

引用

STOCHASTIC ANALYSIS AND APPLICATIONS 2024年第4期42卷 737-752页

作者： van der Hoek, John Elliott, Robert J. Univ South Australia Adelaide Australia Univ Calgary Calgary AB Canada

This article discusses the approximation of probability densities by mixtures of Gaussian densities. The Kullback-Leibler divergence is used as a measure between densities, followed by applications of the em algorithm. The conditions under which we study these questions are motivated by approximations introduced in non-linear Kalman-type filtering.

关键词： Gaussian mixture Kullback-Leibler em algorithm

来源：评论

学校读者我要写书评

暂无评论

Likelihood inference for unified transformation cure model with interval censored data

引用

COMPUTATIONAL STATISTICS 2025年第1期40卷 125-151页

作者： Treszoks, Jodi Pal, Suvra Univ Texas Arlington Dept Math 411 S Nedderman Dr Arlington TX 76019 USA Univ Texas Arlington Coll Sci Div Data Sci Arlington TX 76019 USA

In this paper, we extend the unified class of Box-Cox transformation (BCT) cure rate models to accommodate interval-censored data. The probability of cure is modeled using a general covariate structure, whereas the survival distribution of the uncured is modeled through a proportional hazards structure. We develop likelihood inference based on the expectation maximization (em) algorithm for the BCT cure model. Within the em framework, both simultaneous maximization and profile likelihood are addressed with respect to estimating the BCT transformation parameter. Through Monte Carlo simulations, we demonstrate the performance of the proposed estimation method through calculated bias, root mean square error, and coverage probability of the asymptotic confidence interval. Also considered is the efficacy of the proposed em algorithm as compared to direct maximization of the observed log-likelihood function. Finally, data from a smoking cessation study is analyzed for illustrative purpose.

关键词： em algorithm Interval censoring Smoking cessation Proportional hazards Profile likelihood

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：