检索结果-内蒙古大学图书馆

Modeling Dinophysis in Western Andalucia using a autoregressive hidden Markov model

ENVIRONMENTAL AND ECOLOGICAL STATISTICS 2022年第3期29卷 557-585页

作者： Aron, Jordan Albert, Paul S. Gribble, Matthew O. NCI Biostat Branch Div Canc & Epidemiol Rockville MD 20850 USA Univ Alabama Birmingham Sch Publ Hlth Dept Epidemiol Birmingham AL 35294 USA

Dinophysis spp. can produce diarrhetic shellfish toxins (DST) including okadaic acid and dinophysistoxins, and some strains can also produce non-diarrheic pectenotoxins. Although DSTs are of human health concern and have motivated environmental monitoring programs in many locations, these monitoring programs often have temporal data gaps (e.g., days without measurements). This paper presents a model for the historical time-series, on a daily basis, of DST-producing toxigenic Dinophysis in 8 monitored locations in western Andalucia over 2015-2020, incorporating measurements of algae counts and DST levels. We fitted a bivariate hidden Markov Model (HMM) incorporating an autoregressive correlation among the observed DST measurements to account for environmental persistence of DST. We then reconstruct the maximum-likelihood profile of algae presence in the water column at daily intervals using the Viterbi algorithm. Using historical monitoring data from Andalucia, the model estimated that potentially toxigenic Dinophysis algae is present at greater than or equal to 250 cells/L between< 1% and>10% of the year depending on the site and year. The historical time-series reconstruction enabled by this method may facilitate future investigations into temporal dynamics of toxigenic Dinophysis blooms.

关键词： Autoregressive em algorithm Harmful algal bloom Missing data Toxins

来源：评论

学校读者我要写书评

暂无评论

Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

引用

JOURNAL OF CLASSIFICATION 2022年第3期39卷 648-674页

作者： Casa, Alessandro Cappozzo, Andrea Fop, Michael Free Univ Bozen Bolzano Fac Econ & Management Piazza Univ 1 I-39100 Bolzano Italy Politecn Milan MOX Lab Modeling & Sci Comp Milan Italy Univ Coll Dublin Sch Math & Stat Dublin Ireland

Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.

关键词： Model-based clustering Penalized likelihood Sparse precision matrices Gaussian graphical models Graphical lasso em algorithm

来源：评论

学校读者我要写书评

暂无评论

A mixture model with Poisson and zero-truncated Poisson components to analyze road traffic accidents in Turkey

引用

JOURNAL OF APPLIED STATISTICS 2022年第4期49卷 1003-1017页

作者： Unlu, Hande Konsuk Young, Derek S. Yigiter, Ayten Hilal Ozcebe, L. Hacettepe Univ Inst Publ Hlth Ankara Turkey Univ Kentucky Dept Stat Lexington KY USA Hacettepe Univ Fac Sci Dept Stat Ankara Turkey Hacettepe Univ Dept Publ Hlth Fac Med Ankara Turkey

The analysis of traffic accident data is crucial to address numerous concerns, such as understanding contributing factors in an accident's chain-of-events, identifying hotspots, and informing policy decisions about road safety management. The majority of statistical models employed for analyzing traffic accident data are logically count regression models (commonly Poisson regression) since a count - like the number of accidents - is used as the response. However, features of the observed data frequently do not make the Poisson distribution a tenable assumption. For example, observed data rarely demonstrate an equal mean and variance and often times possess excess zeros. Sometimes, data may have heterogeneous structure consisting of a mixture of populations, rather than a single population. In such data analyses, mixtures-of-Poisson-regression models can be used. In this study, the number of injuries resulting from casualties of traffic accidents registered by the General Directorate of Security (Turkey, 2005-2014) are modeled using a novel mixture distribution with two components: a Poisson and zero-truncated-Poisson distribution. Such a model differs from existing mixture models in literature where the components are either all Poisson distributions or all zero-truncated Poisson distributions. The proposed model is compared with the Poisson regression model via simulation and in the analysis of the traffic data.

关键词： Count data em algorithm finite mixture models identifiability zero-truncated Poisson

来源：评论

学校读者我要写书评

暂无评论

Finite Mixture of Censored Linear Mixed Models for Irregularly Observed Longitudinal Data

引用

JOURNAL OF CLASSIFICATION 2022年第3期39卷 463-486页

作者： de Alencar, Francisco H. C. Matos, Larissa A. Lachos, Victor H. Univ Estadual Campinas Dept Estat Campinas Brazil Univ Connecticut Dept Stat Storrs CT USA

Linear mixed-effects models are commonly used when multiple correlated measurements are made for each unit of interest. Some inherent features of these data can make the analysis challenging, such as when the series of responses are repeatedly collected for each subject at irregular intervals over time or when the data are subject to some upper and/or lower detection limits of the experimental equipment. Moreover, if units are suspected of forming distinct clusters over time, i.e., heterogeneity, then the class of finite mixtures of linear mixed-effects models is required. This paper considers the problem of clustering heterogeneous longitudinal data in a mixture framework and proposes a finite mixture of multivariate normal linear mixed-effects model. This model allows us to accommodate more complex features of longitudinal data, such as measurement at irregular intervals over time and censored data. Furthermore, we consider a damped exponential correlation structure for the random error to deal with serial correlation among the within-subject errors. An efficient expectation-maximization algorithm is employed to compute the maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the multivariate truncated normal distributions. Furthermore, a general information-based method to approximate the asymptotic covariance matrix is also presented. Results obtained from the analysis of both simulated and real HIV/AIDS datasets are reported to demonstrate the effectiveness of the proposed method.

关键词： Censored data Damped exponential correlation em algorithm Finite mixture models Linear mixed-effects models

来源：评论

学校读者我要写书评

暂无评论

Copula-based bivariate finite mixture regression models with an application for insurance claim count data

引用

TEST 2022年第4期31卷 1082-1099页

作者： Bermudez, Lluis Karlis, Dimitris Univ Barcelona UB Dept Matemat Econ Financera & Actuarial RISKctr IREA Av Diagonal 690 Barcelona 08034 Spain Athens Univ Econ & Business Dept Stat 76 Patiss Str Athens Greece

Modeling bivariate (or multivariate) count data has received increased interest in recent years. The aim is to model the number of different but correlated counts taking into account covariate information. Bivariate Poisson regression models based on the shock model approach are widely used because of their simple form and interpretation. However, these models do not allow for overdispersion or negative correlation, and thus, other models have been proposed in the literature to avoid these limitations. The present paper proposes copula-based bivariate finite mixture of regression models. These models offer some advantages since they have all the benefits of a finite mixture, allowing for unobserved heterogeneity and clustering effects, while the copula-based derivation can produce more flexible structures, including negative correlations and regressors. In this paper, the new approach is defined, estimation through an em algorithm is presented, and then different models are applied to a Spanish insurance claim count database.

关键词： Zero-inflation Overdispersion em algorithm Automobile insurance Frank copula

来源：评论

学校读者我要写书评

暂无评论

Extending multivariate Student's-t semiparametric mixed models for longitudinal data with censored responses and heavy tails

引用

STATISTICS IN MEDICINE 2022年第19期41卷 3696-3719页

作者： Mattos, Thalita B. Lachos, Victor H. Castro, Luis M. Matos, Larissa A. Univ Estadual Campinas Dept Estat Sao Paulo Brazil Univ Connecticut Dept Stat Storrs CT 06269 USA Pontificia Univ Catolica Chile Dept Stat Santiago Chile Pontificia Univ Catolica Chile Millennium Nucleus Ctr Discovery Struct Complex D Santiago Chile Pontificia Univ Catolica Chile Ctr Riesgos & Seguros UC Santiago Chile

This article extends the semiparametric mixed model for longitudinal censored data with Gaussian errors by considering the Student's t-distribution. This model allows us to consider a flexible, functional dependence of an outcome variable over the covariates using nonparametric regression. Moreover, the proposed model takes into account the correlation between observations by using random effects. Penalized likelihood equations are applied to derive the maximum likelihood estimates that appear to be robust against outlying observations with respect to the Mahalanobis distance. We estimate nonparametric functions using smoothing splines under an em-type algorithm framework. Finally, the proposed approach's performance is evaluated through extensive simulation studies and an application to two datasets from acquired immunodeficiency syndrome clinical trials.

关键词： censored data em algorithm HIV viral load mixed-effects model semiparametric model Student's-t distribution

来源：评论

学校读者我要写书评

暂无评论

Parameter Estimation of Binned Hawkes Processes

引用

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2022年第4期31卷 990-1000页

作者： Shlomovich, Leigh Cohen, Edward A. K. Adams, Niall Patel, Lekha Imperial Coll London Dept Math London England Sandia Natl Labs Stat Sci POB 5800 Albuquerque NM 87185 USA

A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, binned data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of binned Hawkes processes using a modified Expectation-Maximization algorithm, referred to as Binned Hawkes Expectation Maximization (BH-em). Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel BH-em method significantly outperforms them in all studied circumstances. We further illustrate the performance on network flow (NetFlow) data between devices in a real large-scale computer network, to characterize triggering behavior. These results highlight the importance of correct handling of binned data. Supplementary materials for this article are available online.

关键词： Aggregated data Binned data em algorithm Hawkes processes Self-exciting processes

来源：评论

学校读者我要写书评

暂无评论

A Probit Tensor Factorization Model For Relational Learning

引用

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2022年第3期31卷 846-855页

作者： Liu, Ye Song, Rui Lu, Wenbin Xiao, Yanghua North Carolina State Univ Dept Stat Raleigh NC 27695 USA Fudan Univ Dept Comp Sci Shanghai Peoples R China

With the proliferation of knowledge graphs, modeling data with complex multi-relational structure has gained increasing attention in the area of statistical relational learning. One of the most important goals of statistical relational learning is link prediction, that is, predicting whether certain relations exist in the knowledge graph. A large number of models and algorithms have been proposed to perform link prediction, among which tensor factorization method has proven to achieve state-of-the-art performance in terms of computation efficiency and prediction accuracy. However, a common drawback of the existing tensor factorization models is that the missing relations and nonexisting relations are treated in the same way, which results in a loss of information. To address this issue, we propose a binary tensor factorization model with probit link, which not only inherits the computation efficiency from the classic tensor factorization model but also accounts for the binary nature of relational data. Our proposed probit tensor factorization (PTF) model shows advantages in both the prediction accuracy and interpretability. Supplementary files for this article are available online.

关键词： Alternating least square em algorithm Link prediction Multi-relational data Open-world assumption Probit model

来源：评论

学校读者我要写书评

暂无评论

Parameters and reliability estimation for the weibull distribution based on intuitionistic fuzzy lifetime data

引用

COMPLEX & INTELLIGENT SYSTemS 2022年第6期8卷 4881-4896页

作者： Roohanizadeh, Zahra Baloui Jamkhaneh, Ezzatallah Deiri, Einolah Islamic Azad Univ Qaemshahr Branch Dept Stat Qaemshahr Iran

In this paper, the definition of probability, conditional probability and likelihood function are generalized to the intuitionistic fuzzy observations. We focus on different estimation approaches of two-parameter Weibull (TW) distribution based on the intuitionistic fuzzy lifetime data including, maximum likelihood (ML) and Bayesian estimation methodology. The ML estimation of the parameters and reliability function of TW distribution is provided using the Newton-Raphson (NR) and Expectation-Maximization (em) algorithms. The Bayesian estimates are provided via Tierney and Kadane's approximation. In the Bayesian estimation approach, for the shape and scale parameters, the Gamma and inverse-Gamma priors are considered, respectively. Finally, a simulated data set is analyzed for illustrative purposes to show the applicability of the proposed estimation methods. The Monte Carlo simulations are performed to find the more efficient estimator in the intuitionistic fuzzy environment. The performances of the ML and Bayesian estimates of the parameters and reliability function are compared based on the mean biased (MB) and mean squared errors (MSE) criteria.

关键词： Intuitionistic fuzzy lifetime data Weibull distribution em algorithm NR algorithm Bayesian estimation

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous farm-size dynamics and impacts of subsidies from agricultural policy: Evidence from France

引用

JOURNAL OF AGRICULTURAL ECONOMICS 2022年第3期73卷 893-923页

作者： Saint-Cyr, Legrand D. F. Univ Bourgogne Franche Comte INRAE AgroSup Dijon CESAER Dijon France INRAE Inst Agro Rennes Angers UMR SMART Rennes France

This article aims at investigating the impact of financial supports from agricultural policy on farm-size dynamics. Since not all farms may behave alike, a non-stationary mixed-Markov chain modelling (M-MCM) approach is applied to capture unobserved heterogeneity in the movements of farms across economic size (ES) classes. A multinomial logit specification is used for transition probabilities and the parameters are estimated by the maximum likelihood method and the Expectation-Maximisation (em) algorithm. An empirical application to an unbalanced panel from 2000 to 2018 shows that French farming consists of 'almost stayers', with a high probability of remaining in the same ES class over time, and 'likely movers', which present a higher probability of a change in size. The results also show that the impact of subsidies and other economic factors depends greatly on the type that a farm belongs to. These findings confirm that individual characteristics of farmers may be relevant for policy efficiency and more attention should thus be paid to unobserved farm heterogeneity in both policy design and the assessment of their impacts on farm-size dynamics.

关键词： agricultural policy em algorithm farm-size dynamics mixed-Markov process unobserved heterogeneity

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：