Gaussian mixture model (GMM) and Dirichlet process mixture model (DPMM) are the primary techniques used to characterize uncertainties in power systems, which are commonly solved by expectation-maximization (em) algori...
详细信息
Gaussian mixture model (GMM) and Dirichlet process mixture model (DPMM) are the primary techniques used to characterize uncertainties in power systems, which are commonly solved by expectation-maximization (em) algorithm. However, for the massive data of uncertain variables, the algorithm encounters challenges in accurately obtaining GMM and DPMM with a lower time consumption. To address this issue, we propose a method for GMM uncertainty modeling in power systems considering the mutual assistance of latent variables. Specifically, the GMM of uncertain variables is first constructed, and the conditional probability is employed to characterize the mutual assistance of latent variables. Then, an improved em algorithm is developed to obtain the optimal GMM parameters, in which the expectation step (E-step) and maximization step (M-step) of the algorithm are revised using the conditional probability. Importantly, the closed-form solutions for GMM parameters are rederived in the revised E-step and M-step. Finally, the proposed uncertainty modeling method is compared with the traditional GMM and DPMM on actual wind power and load data from Australia. The proposed method performs the efficiency and accuracy in characterizing uncertainty.
BackgroundMultistate survival models (MSMs) are widely used in the medical field of clinical studies. For example, in type 2 diabetes mellitus (T2D), these models can be applied to describe progression in T2D by prede...
详细信息
BackgroundMultistate survival models (MSMs) are widely used in the medical field of clinical studies. For example, in type 2 diabetes mellitus (T2D), these models can be applied to describe progression in T2D by predefining several T2D states based on available biometric measurements such as hemoglobin A1 C (HbA1c). In most cases, MSMs come with an assumption that the examination process is independent of disease progression. However, in practice, complete independence between disease progression and examination processes is unrealistic, as the frequency at which a patient accesses healthcare may vary based on treatment and/or control of the health *** built a joint model of a 4-state transition process of T2D with informative examination scheme (i.e., the patterns of examination times are not random). Risk factors including age, sex, race, and socioeconomic disadvantage were included in a log-linear model examining T2D transition intensities and healthcare visit frequencies. Parameters of the joint model are estimated under the framework of likelihood function by the expectation-maximization (em) *** joint model demonstrated that people living in neighborhoods with greater socioeconomic disadvantage had a lower healthcare visit frequency under all 4 defined T2D statuses. Evaluation of race/ethnicity revealed that comparing to non-Hispanic White patients, Black patients had higher risk for progressing from Normal to Prediabetes, T2D, and Uncontrolled T2D *** joint model offers a framework for analyzing multistate survival processes while accounting for the dependence between disease progression and examination frequency. Unlike traditional MSMs that estimate only transition intensities, our model captures variations in healthcare visit frequencies across different disease states, providing a more comprehensive understanding of disease dynamics and healthcare access patterns.
This paper deals with a clustering approach based on mixture models to analyze multidimensional mobility count time-series data within a multimodal transport hub. These time series are very likely to evolve depending ...
详细信息
This paper deals with a clustering approach based on mixture models to analyze multidimensional mobility count time-series data within a multimodal transport hub. These time series are very likely to evolve depending on various periods characterized by strikes, maintenance works, or health measures against the Covid19 pandemic. In addition, exogenous one-off factors, such as concerts and transport disruptions, can also impact mobility. Our approach flexibly detects time segments within which the very noisy count data is synthesized into regular spatio-temporal mobility profiles. At the upper level of the modeling, evolving mixing weights are designed to detect segments properly. At the lower level, segment-specific count regression models take into account correlations between series and overdispersion as well as the impact of exogenous factors. For this purpose, we set up and compare two promising strategies that can address this issue, namely the "sums and shares" and "Poisson log-normal" models. The proposed methodologies are applied to actual data collected within a multimodal transport hub in the Paris region. Ticketing logs and pedestrian counts provided by stereo cameras are considered here. Experiments are carried out to show the ability of the statistical models to highlight mobility patterns within the transport hub. One model is chosen based on its ability to detect the most continuous segments possible while fitting the count time series well. An in-depth analysis of the time segmentation, mobility patterns, and impact of exogenous factors obtained with the chosen model is finally performed.
The traditional estimation of mixture regression models is based on the assumption of normal error components, making it susceptible to outliers or heavy-tailed errors. A new robust mixture regression model based on t...
详细信息
The traditional estimation of mixture regression models is based on the assumption of normal error components, making it susceptible to outliers or heavy-tailed errors. A new robust mixture regression model based on the symmetric alpha-stable (S alpha S) distribution, by extending the mixture of S alpha S distributions to the regression setting, is proposed. The S alpha S distribution is a heavy-tailed extension of the normal distribution, with the tails' weight controlled by an additional parameter, alpha is an element of (0, 2]. Generally, the variance of a S alpha S distribution diverges to infinity when alpha < 2, and this allows the model to be more robust than competing heavy-tailed distributions such as the t-distribution when the degrees of freedom are larger than 2, which is advantageous because it allows robustness against the gross outliers in the data. The maximum likelihood estimates of the model parameters (except for alpha) are obtained using an expectation-maximization (em) approach, and alpha is estimated via a stochastic em based on a rejection sampling method. To demonstrate and contrast the proposal with other mixture regression models, real and simulated data are employed.
In this paper, we present a robust joint variable selection procedure for fixed and random effects in semiparametric linear mixed effects model for longitudinal data. Our simultaneous selection method overcomes the de...
详细信息
In this paper, we present a robust joint variable selection procedure for fixed and random effects in semiparametric linear mixed effects model for longitudinal data. Our simultaneous selection method overcomes the defects of typical approaches which select separately each of the two effects components. Meanwhile, the proposed procedure performs better than nonrobust method when there are outliers in the data. This method is based on a robustified penalized joint likelihood of the reparameterized linear mixed effects model through B-splines approximation and Cholesky decomposition. It is further shown that the robust variable selection method enjoys the Oracle property. We demonstrate the performance of the method based on a simulation study in the end.
In the context of network data, bipartite networks are of particular interest, as they provide a useful description of systems representing relationships between sending and receiving nodes. In this framework, we exte...
详细信息
In the context of network data, bipartite networks are of particular interest, as they provide a useful description of systems representing relationships between sending and receiving nodes. In this framework, we extend the mixture of latent trait analyzers (MLTA) model with concomitant variables (nodal attributes) to perform a joint clustering of the two disjoint sets of nodes of a bipartite network, as in the biclustering framework. In detail, sending nodes are partitioned into clusters (called components) via a finite mixture of latent trait models. In each component, receiving nodes are partitioned into clusters (called segments) by adopting a flexible and parsimonious specification of the linear predictor. Residual dependence between receiving nodes is modeled via a multidimensional latent trait, as in the original MLTA specification. Furthermore, by incorporating nodal attributes into the model's latent layer, we gain insight into how these attributes impact the formation of components. To estimate model parameters, an em-type algorithm based on a Gauss-Hermite approximation of intractable integrals is proposed. A simulation study is conducted to test the performance of the model in terms of clustering and parameters' recovery. The proposed model is applied to a bipartite network on pediatric patients possibly affected by appendicitis with the objective of identifying groups of patients (sending nodes) being similar with respect to subsets of clinical conditions (receiving nodes).
We analyze the temporal structure of a novel insurance dataset about home insurance claims related to rainfall- induced damage in Norway and employ a hidden semi-Markov model (HSMM) to capture the non-Gaussian nature ...
详细信息
We analyze the temporal structure of a novel insurance dataset about home insurance claims related to rainfall- induced damage in Norway and employ a hidden semi-Markov model (HSMM) to capture the non-Gaussian nature and temporal dynamics of these claims. By examining a broad range of candidate sojourn and emission distributions and assessing the goodness-of-fit and commonly used risk measures of the corresponding HSMM, we identify an appropriate model for effectively representing insurance losses caused by rainfall-related incidents. Our findings highlight the importance of considering the temporal aspects of weather-related insurance claims and demonstrate that the proposed HSMM adeptly captures this feature. Moreover, the model estimates reveal a concerning trend: the risks associated with heavy rain in the context of home insurance have exhibited an upward trajectory between 2004 and 2020, aligning with the evidence of a changing climate. This insight has significant implications for insurance companies, providing them with valuable information for accurate and robust modeling in the face of climate uncertainties. By shedding light on the evolving risks related to heavy rain and their impact on home insurance, our study offers essential insights for insurance companies to adapt their strategies and effectively manage these emerging challenges. It underscores the necessity of incorporating climate change considerations into insurance models and emphasizes the importance of continuously monitoring and reassessing risk levels associated with rainfall-induced damage. Ultimately, our research contributes to the broader understanding of climate risk in the insurance industry and supports the development of resilient and sustainable insurance practices.
This study aims to estimate the reliability of a stress-strength system using the generalized inverted exponential distribution (GIED). We achieve this by employing an improved adaptive Type-II progressive censoring s...
详细信息
This study aims to estimate the reliability of a stress-strength system using the generalized inverted exponential distribution (GIED). We achieve this by employing an improved adaptive Type-II progressive censoring scheme and utilizing various estimation techniques. The techniques used include maximum likelihood estimation through the em algorithm and Bayesian inference. We use Markov chain Monte Carlo (MCMC) methods and TK approximation in the Bayesian framework. We compute various intervals, such as asymptotic confidence, arcsin transformed, Bayesian credible, and higher posterior density confidence intervals. To guide the estimation process, we use a generalized entropy loss function. Additionally, we conduct a comprehensive simulation analysis to validate the method's performance and rigorously assess its applicability through real-life data analysis.
We consider a linear model with a change point according to the unknown random threshold of a covariate. We give the expectation-maximization(em) estimations of the regression and change point parameters. The existenc...
详细信息
We consider a linear model with a change point according to the unknown random threshold of a covariate. We give the expectation-maximization(em) estimations of the regression and change point parameters. The existence of the random change point is detected by the supremum(SUP) test of score statistics. Theoretically, we establish the convergence and asymptotic distribution of the estimation and show that the em estimates converge in distribution to a normal distribution. In addition, the numerical performance of the proposed approach is demonstrated through simulation studies. Finally, applying our methodology to household financial decisions, we see that the average debt tolerance of Chinese households is estimated to be 1.1364 times the sum of total household income and financial assets. The effect of assets and income on consumption shows a rapid decline if the household exceeds the average debt tolerance.
Recently, the progressive Type-II censoring has been extended to a more general censoring scheme, called joint progressive Type-II censoring, which studies the lifetimes of two or more populations simultaneously. In t...
详细信息
Recently, the progressive Type-II censoring has been extended to a more general censoring scheme, called joint progressive Type-II censoring, which studies the lifetimes of two or more populations simultaneously. In this article, we consider the joint Type-II progressive censoring scheme for two populations when their lifetimes follow Topp-Leone models with unknown common scale parameter but different shape parameters. Classical and Bayesian inferences are studied. Expectation-Maximization (em) algorithm is implemented for obtaining the maximum likelihood estimators (MLEs) and the associated asymptotic confidence intervals of the unknown parameters. Bayesian inferences are discussed based on a beta-gamma prior for the shape parameters and an incomplete inverse gamma prior for the scale parameter. Importance sampling method is proposed to approximate the Bayes estimates. The associated Bayesian credible intervals are also established. Monte Carlo simulation study is performed to compare the performance of the proposed methods. Finally, a real data set representing two different algorithms for estimating unit capacity factors is analyzed for illustrative purposes.
暂无评论