With the continuous increase and complexity of network traffic, traditional network traffic recognition technology is facing numerous difficulties, especially in dealing with outlier data and improving recognition acc...
详细信息
With the continuous increase and complexity of network traffic, traditional network traffic recognition technology is facing numerous difficulties, especially in dealing with outlier data and improving recognition accuracy. Therefore, an improved expectation maximization algorithm based on the constraint matrix Z and Tsallis entropy is proposed. The core goal of this algorithm is to accelerate the convergence of classification and improve accuracy. Furthermore, to enhance the classification accuracy, the spatial expectation maximization algorithm is introduced, which innovatively converts the sample mean and covariance matrix into L-1 -median and modified rank covariance matrix. According to the experimental data, the recall rate of the original expectation maximization algorithm is only 74%. However, the recall rate of the spatial expectation maximization algorithm in the Attack service has significantly increased to 85%. In other tests, such as Www and Peer-to-peer services, the recall rate has also significantly improved, increasing from 96% and 95.3% to 97.7% and 96.1%, respectively. These experimental results highlight the superior robustness of the spatial expectation maximization algorithm in handling outlier data. It further proves the outstanding performance in improving the accuracy of network traffic recognition. This research has brought significant innovation and potential practical value to the network traffic identification.
Phase IV clinical trials are designed to monitor long-term side effects of medical treatment. For instance, childhood cancer survivors treated with chest radiation and/or anthracycline are often at risk of developing ...
详细信息
Phase IV clinical trials are designed to monitor long-term side effects of medical treatment. For instance, childhood cancer survivors treated with chest radiation and/or anthracycline are often at risk of developing cardiotoxicity during their adulthood. Often the primary focus of a study could be on estimating the cumulative incidence of a particular outcome of interest such as cardiotoxicity. However, it is challenging to evaluate patients continuously and usually, this information is collected through cross-sectional surveys by following patients longitudinally. This leads to interval-censored data since the exact time of the onset of the toxicity is unknown. Rai et al. computed the transition intensity rate using a parametric model and estimated parameters using maximum likelihood approach in an illness-death model. However, such approach may not be suitable if the underlying parametric assumptions do not hold. This manuscript proposes a semi-parametric model, with a logit relationship for the treatment intensities in two groups, to estimate the transition intensity rates within the context of an illness-death model. The estimation of the parameters is done using an em algorithm with profile likelihood. Results from the simulation studies suggest that the proposed approach is easy to implement and yields comparable results to the parametric model.
The multivariate Fay-Herriot model has been shown to be useful in various applications when there are multiple response variables. Therefore, several studies of the model have been pursued specially the study of estim...
详细信息
The multivariate Fay-Herriot model has been shown to be useful in various applications when there are multiple response variables. Therefore, several studies of the model have been pursued specially the study of estimation techniques of the variance components. The two benchmark methods for the variance component estimation are the profile maximum likelihood method and the residual maximum likelihood method. However, it has been shown in literature that these methods can produce zero estimates of the variance components. This leads to unfavorable results since the direct estimates do not contribute to the EBLUPs. In this paper, we propose alternative estimation methods based on the em algorithm for the variance components of the multivariate Fay-Herriot model. In our study, we illustrate the procedures of the em algorithms for the profile likelihood and the residual likelihood. Moreover, we perform a Monte Carlo simulation to investigate the performances of the proposed methods comparing with some existing methods. The simulation results suggest that the em algorithm improves those existing methods in certain cases particularly for the cases of small number of areas which are commonly found in applications. Finally, we apply the proposed estimates to the average household income and average household expenditure in Thailand.
In this paper, we focus on studying the Mixed-Effects State-Space (MESS) models previously introduced by Liu et al. [Liu D, Lu T, Niu X-F, et al. Mixed-effects state-space models for analysis of longitudinal dynamic s...
详细信息
In this paper, we focus on studying the Mixed-Effects State-Space (MESS) models previously introduced by Liu et al. [Liu D, Lu T, Niu X-F, et al. Mixed-effects state-space models for analysis of longitudinal dynamic systems. Biometrics. 2011;67(2):476-485]. We propose an estimation method by combining the auxiliary particle learning and smoothing approach with the Expectation Maximization (em) algorithm. First, we describe the technical details of the algorithm steps. Then, we evaluate their effectiveness and goodness of fit through a simulation study. Our method requires expressing the posterior distribution for the random effects using a sufficient statistic that can be updated recursively, thus enabling its application to various model formulations including non-Gaussian and nonlinear cases. Finally, we demonstrate the usefulness of our method and its capability to handle the missing data problem through an application to a real dataset.
The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations o...
详细信息
The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (em) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM .
In this paper, we consider statistical estimation of time-inhomogeneous aggregate Markov models. Unaggregated models, which corresponds to Markov chains, are commonly used in multi-state life insurance to model the bi...
详细信息
In this paper, we consider statistical estimation of time-inhomogeneous aggregate Markov models. Unaggregated models, which corresponds to Markov chains, are commonly used in multi-state life insurance to model the biometric states of an insured. By aggregating microstates to each biometric state, we are able to model dependencies between transitions of the biometric states as well as the distribution of occupancy in these. This allows for non-Markovian modelling in general. Since only paths of the macrostates are observed, we develop an expectation-maximisation (em) algorithm to obtain maximum likelihood estimates of transition intensities on the micro level. Special attention is given to a semi-Markovian case, known as the reset property, which leads to simplified estimation procedures where em algorithms for inhomogeneous phase-type distributions can be used as building blocks. We provide a numerical example of the latter in combination with piecewise constant transition rates in a three-state disability model with data simulated from a time-inhomogeneous semi-Markov model. Comparisons of our fits with more classic GLM-based fits as well as true and empirical distributions are provided to relate our model to existing models and their tools.
In this article, a new method is proposed for clustering longitudinal curves. In the proposed method, clusters of mean functions are identified through a weighted concave pairwise fusion method. The em algorithm and t...
详细信息
In this article, a new method is proposed for clustering longitudinal curves. In the proposed method, clusters of mean functions are identified through a weighted concave pairwise fusion method. The em algorithm and the alternating direction method of multipliers algorithm are combined to estimate the group structure, mean functions and principal components simultaneously. The proposed method also allows to incorporate the prior neighborhood information to have more meaningful groups by adding pairwise weights in the pairwise penalties. In the simulation study, the performance of the proposed method is compared to some existing clustering methods in terms of the accuracy for estimating the number of subgroups and mean functions. The results suggest that ignoring the covariance structure will have a great effect on the performance of estimating the number of groups and estimating accuracy. The effect of including pairwise weights is also explored in a spatial lattice setting to take into consideration of the spatial information. The results show that incorporating spatial weights will improve the performance. A real example is used to illustrate the proposed method.
The Expectation Maximization (em) algorithm is widely used in latent variable model inference. However, when data are distributed across various locations, directly applying the em algorithm can often be impractical d...
详细信息
The Expectation Maximization (em) algorithm is widely used in latent variable model inference. However, when data are distributed across various locations, directly applying the em algorithm can often be impractical due to communication expenses and privacy considerations. To address these challenges, a communication-efficient distributed em algorithm is proposed. Under mild conditions, the proposed estimator achieves the same mean squared error bound as the centralized estimator. Furthermore, the proposed method requires only one extra round of communication compared to the Average estimator. Numerical simulations and a real data example demonstrate that the proposed estimator significantly outperforms the Average estimator in terms of mean squared errors.
The Photon Counting Histogram Expectation Maximization (PCH-em) algorithm has recently been reported as a candidate method for the characterization of Deep Sub-Electron Read Noise (DSERN) image sensors. This work desc...
详细信息
The Photon Counting Histogram Expectation Maximization (PCH-em) algorithm has recently been reported as a candidate method for the characterization of Deep Sub-Electron Read Noise (DSERN) image sensors. This work describes a comprehensive demonstration of the PCH-em algorithm applied to a DSERN capable quanta image sensor. The results show that PCH-em is able to characterize DSERN pixels for a large span of quanta exposure and read noise values. The per-pixel characterization results of the sensor are combined with the proposed Photon Counting Distribution (PCD) model to demonstrate the ability of PCH-em to predict the ensemble distribution of the device. The agreement between experimental observations and model predictions demonstrates both the applicability of the PCD model in the DSERN regime as well as the ability of the PCH-em algorithm to accurately estimate the underlying model parameters.
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (em) algorithm for Gaussian mixture models, has shown interestin...
详细信息
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (em) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or have non-Gaussian distributions. To overcome this issue, a new em algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the em framework with missing data thanks to its conditional distribution, which is shown to be a multivariate t-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.
暂无评论