Dinophysis spp. can produce diarrhetic shellfish toxins (DST) including okadaic acid and dinophysistoxins, and some strains can also produce non-diarrheic pectenotoxins. Although DSTs are of human health concern and h...
详细信息
Dinophysis spp. can produce diarrhetic shellfish toxins (DST) including okadaic acid and dinophysistoxins, and some strains can also produce non-diarrheic pectenotoxins. Although DSTs are of human health concern and have motivated environmental monitoring programs in many locations, these monitoring programs often have temporal data gaps (e.g., days without measurements). This paper presents a model for the historical time-series, on a daily basis, of DST-producing toxigenic Dinophysis in 8 monitored locations in western Andalucia over 2015-2020, incorporating measurements of algae counts and DST levels. We fitted a bivariate hidden Markov Model (HMM) incorporating an autoregressive correlation among the observed DST measurements to account for environmental persistence of DST. We then reconstruct the maximum-likelihood profile of algae presence in the water column at daily intervals using the Viterbi algorithm. Using historical monitoring data from Andalucia, the model estimated that potentially toxigenic Dinophysis algae is present at greater than or equal to 250 cells/L between< 1% and>10% of the year depending on the site and year. The historical time-series reconstruction enabled by this method may facilitate future investigations into temporal dynamics of toxigenic Dinophysis blooms.
Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimens...
详细信息
Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.
The analysis of traffic accident data is crucial to address numerous concerns, such as understanding contributing factors in an accident's chain-of-events, identifying hotspots, and informing policy decisions abou...
详细信息
The analysis of traffic accident data is crucial to address numerous concerns, such as understanding contributing factors in an accident's chain-of-events, identifying hotspots, and informing policy decisions about road safety management. The majority of statistical models employed for analyzing traffic accident data are logically count regression models (commonly Poisson regression) since a count - like the number of accidents - is used as the response. However, features of the observed data frequently do not make the Poisson distribution a tenable assumption. For example, observed data rarely demonstrate an equal mean and variance and often times possess excess zeros. Sometimes, data may have heterogeneous structure consisting of a mixture of populations, rather than a single population. In such data analyses, mixtures-of-Poisson-regression models can be used. In this study, the number of injuries resulting from casualties of traffic accidents registered by the General Directorate of Security (Turkey, 2005-2014) are modeled using a novel mixture distribution with two components: a Poisson and zero-truncated-Poisson distribution. Such a model differs from existing mixture models in literature where the components are either all Poisson distributions or all zero-truncated Poisson distributions. The proposed model is compared with the Poisson regression model via simulation and in the analysis of the traffic data.
Linear mixed-effects models are commonly used when multiple correlated measurements are made for each unit of interest. Some inherent features of these data can make the analysis challenging, such as when the series o...
详细信息
Linear mixed-effects models are commonly used when multiple correlated measurements are made for each unit of interest. Some inherent features of these data can make the analysis challenging, such as when the series of responses are repeatedly collected for each subject at irregular intervals over time or when the data are subject to some upper and/or lower detection limits of the experimental equipment. Moreover, if units are suspected of forming distinct clusters over time, i.e., heterogeneity, then the class of finite mixtures of linear mixed-effects models is required. This paper considers the problem of clustering heterogeneous longitudinal data in a mixture framework and proposes a finite mixture of multivariate normal linear mixed-effects model. This model allows us to accommodate more complex features of longitudinal data, such as measurement at irregular intervals over time and censored data. Furthermore, we consider a damped exponential correlation structure for the random error to deal with serial correlation among the within-subject errors. An efficient expectation-maximization algorithm is employed to compute the maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the multivariate truncated normal distributions. Furthermore, a general information-based method to approximate the asymptotic covariance matrix is also presented. Results obtained from the analysis of both simulated and real HIV/AIDS datasets are reported to demonstrate the effectiveness of the proposed method.
Modeling bivariate (or multivariate) count data has received increased interest in recent years. The aim is to model the number of different but correlated counts taking into account covariate information. Bivariate P...
详细信息
Modeling bivariate (or multivariate) count data has received increased interest in recent years. The aim is to model the number of different but correlated counts taking into account covariate information. Bivariate Poisson regression models based on the shock model approach are widely used because of their simple form and interpretation. However, these models do not allow for overdispersion or negative correlation, and thus, other models have been proposed in the literature to avoid these limitations. The present paper proposes copula-based bivariate finite mixture of regression models. These models offer some advantages since they have all the benefits of a finite mixture, allowing for unobserved heterogeneity and clustering effects, while the copula-based derivation can produce more flexible structures, including negative correlations and regressors. In this paper, the new approach is defined, estimation through an em algorithm is presented, and then different models are applied to a Spanish insurance claim count database.
This article extends the semiparametric mixed model for longitudinal censored data with Gaussian errors by considering the Student's t-distribution. This model allows us to consider a flexible, functional dependen...
详细信息
This article extends the semiparametric mixed model for longitudinal censored data with Gaussian errors by considering the Student's t-distribution. This model allows us to consider a flexible, functional dependence of an outcome variable over the covariates using nonparametric regression. Moreover, the proposed model takes into account the correlation between observations by using random effects. Penalized likelihood equations are applied to derive the maximum likelihood estimates that appear to be robust against outlying observations with respect to the Mahalanobis distance. We estimate nonparametric functions using smoothing splines under an em-type algorithm framework. Finally, the proposed approach's performance is evaluated through extensive simulation studies and an application to two datasets from acquired immunodeficiency syndrome clinical trials.
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Comb...
详细信息
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, binned data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of binned Hawkes processes using a modified Expectation-Maximization algorithm, referred to as Binned Hawkes Expectation Maximization (BH-em). Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel BH-em method significantly outperforms them in all studied circumstances. We further illustrate the performance on network flow (NetFlow) data between devices in a real large-scale computer network, to characterize triggering behavior. These results highlight the importance of correct handling of binned data. Supplementary materials for this article are available online.
With the proliferation of knowledge graphs, modeling data with complex multi-relational structure has gained increasing attention in the area of statistical relational learning. One of the most important goals of stat...
详细信息
With the proliferation of knowledge graphs, modeling data with complex multi-relational structure has gained increasing attention in the area of statistical relational learning. One of the most important goals of statistical relational learning is link prediction, that is, predicting whether certain relations exist in the knowledge graph. A large number of models and algorithms have been proposed to perform link prediction, among which tensor factorization method has proven to achieve state-of-the-art performance in terms of computation efficiency and prediction accuracy. However, a common drawback of the existing tensor factorization models is that the missing relations and nonexisting relations are treated in the same way, which results in a loss of information. To address this issue, we propose a binary tensor factorization model with probit link, which not only inherits the computation efficiency from the classic tensor factorization model but also accounts for the binary nature of relational data. Our proposed probit tensor factorization (PTF) model shows advantages in both the prediction accuracy and interpretability. Supplementary files for this article are available online.
In this paper, the definition of probability, conditional probability and likelihood function are generalized to the intuitionistic fuzzy observations. We focus on different estimation approaches of two-parameter Weib...
详细信息
In this paper, the definition of probability, conditional probability and likelihood function are generalized to the intuitionistic fuzzy observations. We focus on different estimation approaches of two-parameter Weibull (TW) distribution based on the intuitionistic fuzzy lifetime data including, maximum likelihood (ML) and Bayesian estimation methodology. The ML estimation of the parameters and reliability function of TW distribution is provided using the Newton-Raphson (NR) and Expectation-Maximization (em) algorithms. The Bayesian estimates are provided via Tierney and Kadane's approximation. In the Bayesian estimation approach, for the shape and scale parameters, the Gamma and inverse-Gamma priors are considered, respectively. Finally, a simulated data set is analyzed for illustrative purposes to show the applicability of the proposed estimation methods. The Monte Carlo simulations are performed to find the more efficient estimator in the intuitionistic fuzzy environment. The performances of the ML and Bayesian estimates of the parameters and reliability function are compared based on the mean biased (MB) and mean squared errors (MSE) criteria.
This article aims at investigating the impact of financial supports from agricultural policy on farm-size dynamics. Since not all farms may behave alike, a non-stationary mixed-Markov chain modelling (M-MCM) approach ...
详细信息
This article aims at investigating the impact of financial supports from agricultural policy on farm-size dynamics. Since not all farms may behave alike, a non-stationary mixed-Markov chain modelling (M-MCM) approach is applied to capture unobserved heterogeneity in the movements of farms across economic size (ES) classes. A multinomial logit specification is used for transition probabilities and the parameters are estimated by the maximum likelihood method and the Expectation-Maximisation (em) algorithm. An empirical application to an unbalanced panel from 2000 to 2018 shows that French farming consists of 'almost stayers', with a high probability of remaining in the same ES class over time, and 'likely movers', which present a higher probability of a change in size. The results also show that the impact of subsidies and other economic factors depends greatly on the type that a farm belongs to. These findings confirm that individual characteristics of farmers may be relevant for policy efficiency and more attention should thus be paid to unobserved farm heterogeneity in both policy design and the assessment of their impacts on farm-size dynamics.
暂无评论