The expectation-maximisation algorithm is employed to perform maximum likelihood estimation in a wide range of situations, including regression analysis based on clusterwise regression models. A disadvantage of using ...
详细信息
The expectation-maximisation algorithm is employed to perform maximum likelihood estimation in a wide range of situations, including regression analysis based on clusterwise regression models. A disadvantage of using this algorithm is that it is unable to provide an assessment of the sample variability of the maximum likelihood estimator. This inability is a consequence of the fact that the algorithm does not require deriving an analytical expression for the Hessian matrix, thus preventing from a direct evaluation of the asymptotic covariance matrix of the estimator. A solution to this problem when performing linear regression analysis through a multivariate Gaussian clusterwise regression model is developed. Two estimators of the asymptotic covariance matrix of the maximum likelihood estimator are proposed. In practical applications their use makes it possible to avoid resorting to bootstrap techniques and general purpose mathematical optimisers. The performances of these estimators are evaluated in analysing small simulated and real datasets;the obtained results illustrate their usefulness and effectiveness in practical applications. From a theoretical point of view, under suitable conditions, the proposed estimators are shown to be consistent.
In this article, we consider the density estimation for data with a mixture structure, where the component densities are assumed unknown, but for each observation, the probabilities of its membership to the subpopulat...
详细信息
In this article, we consider the density estimation for data with a mixture structure, where the component densities are assumed unknown, but for each observation, the probabilities of its membership to the subpopulations are known or estimable from other resources. Data of this kind arise from practice and have wide applications. Motivated from the classical kernel density estimation method for a single population, we propose a weighted kernel density estimation method to estimate the component density functions nonparametrically. Within the framework of the em algorithm, we derive an algorithm that computes our proposed estimates effectively. Via extensive simulation studies, we demonstrate that our methods outperform the existing methods in most occasions. We further compare our methods with existing methods by real data examples.
In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is n...
详细信息
In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is not valid if the normal assumption is violated. By extending the semiparametric regression estimator proposed by Hunter and Young (J Nonparametr Stat 24:19-38, 2012a) which requires the component error densities to be the same (including homogeneous variance), we propose semiparametric mixture of linear regression models with unspecified component error distributions to reduce the modeling bias. We establish a more general identifiability result under weaker conditions than existing results, construct a class of new estimators, and establish their asymptotic properties. These asymptotic results also apply to many existing semiparametric mixture regression estimators whose asymptotic properties have remained unknown due to the inherent difficulties in obtaining them. Using simulation studies, we demonstrate the superiority of the proposed estimators over the MLE when the normal error assumption is violated and the comparability when the error is normal. Analysis of a newly collected Equine Infectious Anemia Virus data in 2017 is employed to illustrate the usefulness of the new estimator.
A model-based biclustering method for multivariate discrete longitudinal data is proposed. We consider a finite mixture of generalized linear models to cluster units and, within each mixture component, we adopt a flex...
详细信息
A model-based biclustering method for multivariate discrete longitudinal data is proposed. We consider a finite mixture of generalized linear models to cluster units and, within each mixture component, we adopt a flexible and parsimonious parameterization of the component-specific canonical parameter to define subsets of variables (segments) sharing common dynamics over time. We develop an Expectation-Maximization-type algorithm for maximum likelihood estimation of model parameters. The performance of the proposed model is evaluated on a large scale simulation study, where we consider different choices for the sample the size, the number of measurement occasions, the number of components and segments. The proposal is applied to Italian crime data (font ISTAT) with the aim to detect areas sharing common longitudinal trajectories for specific subsets of crime types. The identification of such biclusters may potentially be helpful for policymakers to make decisions on safety.
The causes of many complex human diseases are still largely unknown. Genetics plays an important role in uncovering the molecular mechanisms of complex human diseases. A key step to characterize the genetics of a comp...
详细信息
The causes of many complex human diseases are still largely unknown. Genetics plays an important role in uncovering the molecular mechanisms of complex human diseases. A key step to characterize the genetics of a complex human disease is to unbiasedly identify disease-associated gene transcripts on a whole-genome scale. Confounding factors could cause false positives. Paired design, such as measuring gene expression before and after treatment for the same subject, can reduce the effect of known confounding factors. However, not all known confounding factors can be controlled in a paired/match design. Model-based clustering, such as mixtures of hierarchical models, has been proposed to detect gene transcripts differentially expressed between paired samples. To the best of our knowledge, no model-based gene clustering methods have the capacity to adjust for the effects of covariates yet. In this article, we proposed a novel mixture of hierarchical models with covariate adjustment in identifying differentially expressed transcripts using high-throughput whole-genome data from paired design. Both simulation study and real data analysis show the good performance of the proposed method.
Degradation modelling plays a vital role in reliability engineering. Existing degradation models mainly focus on degradation data with a single degradation characteristic (DC) and assume that test units are mutually i...
详细信息
Degradation modelling plays a vital role in reliability engineering. Existing degradation models mainly focus on degradation data with a single degradation characteristic (DC) and assume that test units are mutually inde-pendent. However, in certain degradation tests, interest lies in multiple statistically dependent DCs, and the test units may be interdependent due to sharing certain unobservable effects. This article proposes a novel multi-variate degradation model that considers dependency in both DC and unit dimensions. Temporal dependency in the DC dimension is modelled based on sharing Brownian noises, and the number of underlying Brownian noises is determined using factor analysis. Temporal dependency in the unit dimension is also considered and incor-porated into the model by sharing temporal volatility to all units. Statistical inferences corresponding to the proposed model, including an expectation-maximisation algorithm for point estimation, a parametric bootstrap approach for interval estimation, a hypothesis test approach for testing significance of temporal dependency in unit dimension, a goodness-of-fit test for model validation, and the reliability function under a series failure structure are developed. Performance and applicability of the proposed model are demonstrated by a simulation study and a case study. Supplementary materials for this article are available online.
Structural identification and damage detection can be generalized as the simultaneous estimation of input forces, physical parameters, and dynamical states. Although Kalman-type filters are efficient tools to address ...
详细信息
Structural identification and damage detection can be generalized as the simultaneous estimation of input forces, physical parameters, and dynamical states. Although Kalman-type filters are efficient tools to address this problem, the calibration of noise covariance matrices is cumber-some. For instance, calibration of input noise covariance matrix in augmented or dual Kalman filters is a critical task since a slight variation in its value can adversely affect estimations. The present study develops a Bayesian Expectation-Maximization (Bem) methodology for the un-certainty quantification and propagation in coupled input-state-parameter-noise identification problems. It also proposes the incorporation of input dummy observations for stabilizing low -frequency components of the latent states and mitigating potential drifts. In this respect, the covariance matrix of the dummy observations is also calibrated based on the measured data. Additionally, an explicit formulation is provided to study the theoretical observability of the Bayesian estimators, which helps characterize the minimum sensor requirements. Ultimately, the Bem is tested and verified through numerical and experimental examples, wherein sensor con-figurations, multiple input forces, and abrupt stiffness changes are investigated. It is confirmed that the Bem provides accurate estimations of states, input, and parameters while characterizing the degree of belief in these estimations based on the posterior uncertainties driven by applying a Bayesian perspective.
Cyberinfrastructure (e.g., sensors, actuators and the associated communication network) has become an integral part of our modern power grid. While these cyber technologies enhance situational awareness and operationa...
详细信息
Cyberinfrastructure (e.g., sensors, actuators and the associated communication network) has become an integral part of our modern power grid. While these cyber technologies enhance situational awareness and operational efficiency, they also expose the physical system to cyber-attacks. In this paper, we consider the problem of transmission system state estimation based on measurements from a number of PMUs. In this context, two PMU data integrity attacks namely, Time Synchronization Attack (TSA) and Man-in-the-Middle (MitM) attacks that can potentially cause a severe impact on the grid, are analyzed. Specifically, we propose a novel method based on an alternate expectation-maximization framework to mitigate the effects of these attacks on the state estimation process. Numerical tests are conducted on IEEE-14, 30 and 118 bus systems with different attack scenarios to validate the developed method. Unlike existing works, the proposed algorithm provides accurate state estimates without any prior knowledge of the location of the attack, the number of meters being attacked, or the magnitude of the attack parameter.
Fault detection and classification is an important part of assessing the structural and system health status. The classification and detection of faults and faulty units is mostly done with statistical methods. After ...
详细信息
Fault detection and classification is an important part of assessing the structural and system health status. The classification and detection of faults and faulty units is mostly done with statistical methods. After the data are measured and collected, the use of statistical software is necessary. Currently, many statistical software packages are being developed for the R programming language, as a result of R implementation being open source and free to use. This paper focuses on the rebmix R package, which concentrates on mixture model estimation. Mixture models, in particular Gaussian mixture models, are the main driver for many practical applications, such as clustering and classification. Hence, in this paper, we have expanded the rebmix for the estimation of the Gaussian mixtures. The results acquired on three different fault classification datasets were promising. Additionally, the process of obtaining those results is shown in detail, giving the researchers in the fault classification field useful resources for their research.
Due to the fast growth of data that are measured on a continuous scale, functional data analysis has undergone many developments in recent years. Regression models with a functional response involving functional covar...
详细信息
Due to the fast growth of data that are measured on a continuous scale, functional data analysis has undergone many developments in recent years. Regression models with a functional response involving functional covariates, also called "function-on-function", are thus becoming very common. Studying this type of model in the presence of heterogeneous data can be particularly useful in various practical situations. We mainly develop in this work a function-on-function Mixture of Experts (FFMoE) regression model. Like most of the inference approach for models on functional data, we use basis expansion (B-splines) both for covariates and parameters. A regularized inference approach is also proposed, it accurately smoothes functional parameters in order to provide interpretable estimators. Numerical studies on simulated data illustrate the good performance of FFMoE as compared with competitors. Usefullness of the proposed model is illustrated on two data sets: the reference Canadian weather data set, in which the precipitations are modeled according to the temperature, and a Cycling data set, in which the developed power is explained by the speed, the cyclist heart rate and the slope of the road.
暂无评论