With the continuous increase and complexity of network traffic, traditional network traffic recognition technology is facing numerous difficulties, especially in dealing with outlier data and improving recognition acc...
详细信息
With the continuous increase and complexity of network traffic, traditional network traffic recognition technology is facing numerous difficulties, especially in dealing with outlier data and improving recognition accuracy. Therefore, an improved expectation maximization algorithm based on the constraint matrix Z and Tsallis entropy is proposed. The core goal of this algorithm is to accelerate the convergence of classification and improve accuracy. Furthermore, to enhance the classification accuracy, the spatial expectation maximization algorithm is introduced, which innovatively converts the sample mean and covariance matrix into L-1 -median and modified rank covariance matrix. According to the experimental data, the recall rate of the original expectation maximization algorithm is only 74%. However, the recall rate of the spatial expectation maximization algorithm in the Attack service has significantly increased to 85%. In other tests, such as Www and Peer-to-peer services, the recall rate has also significantly improved, increasing from 96% and 95.3% to 97.7% and 96.1%, respectively. These experimental results highlight the superior robustness of the spatial expectation maximization algorithm in handling outlier data. It further proves the outstanding performance in improving the accuracy of network traffic recognition. This research has brought significant innovation and potential practical value to the network traffic identification.
This paper presents a new robust em algorithm for the finite mixture learning procedures. The proposed Spatial-em algorithm utilizes median-based location and rank-based scatter estimators to replace sample mean and s...
详细信息
This paper presents a new robust em algorithm for the finite mixture learning procedures. The proposed Spatial-em algorithm utilizes median-based location and rank-based scatter estimators to replace sample mean and sample covariance matrix in each M step, hence enhancing stability and robustness of the algorithm. It is robust to outliers and initial values. Compared with many robust mixture learning methods, the Spatial-em has the advantages of simplicity in implementation and statistical efficiency. We apply Spatial-em to supervised and unsupervised learning scenarios. More specifically, robust clustering and outlier detection methods based on Spatial-em have been proposed. We apply the outlier detection to taxonomic research on fish species novelty discovery. Two real datasets are used for clustering analysis. Compared with the regular em and many other existing methods such as K-median, X-em and SVM, our method demonstrates superior performance and high robustness.
The Gaussian process is a powerful statistical learning model and has been applied widely in nonlinear regression and classification. However, it fails to model multi-modal data from a non-stationary source since a pr...
详细信息
The Gaussian process is a powerful statistical learning model and has been applied widely in nonlinear regression and classification. However, it fails to model multi-modal data from a non-stationary source since a prior Gaussian process is generally stationary. Based on the idea of the mixture of experts, the mixture of Gaussian processes was established to increase the model flexibility. On the other hand, the Gaussian process is also sensitive to outliers and thus robust Gaussian processes have been suggested to own the heavy-tailed property. In practical applications, the datasets may be multi-modal and contain outliers at the same time. In order to overcome these two difficulties together, we propose a mixture of robust Gaussian processes (MRGP) model and establish a precise hard-cut em algorithm for learning its parameters. Since the exact solving process is intractable due to the fact that non-Gaussian probability density functions of the noises are adopted into the likelihood of the proposed model on the dataset, we employ a variational bounding method to approximate the marginal likelihood functions so that the hard-cut em algorithm can be implemented effectively. Moreover, we conduct various experiments on both synthetic and real-world datasets to evaluate and compare our proposed MRGP method with several competitive nonlinear regression methods. The experimental results demonstrate that our MRGP model with the hard-cut em algorithm is much more effective and robust than the competitive nonlinear regression models. (c) 2021 Elsevier B.V. All rights reserved.
A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estim...
详细信息
A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log-likelihood function. We show, however, that an extension of the em algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.
Joint modeling techniques have become a popular strategy for studying the association between a response and one or more longitudinal covariates. Motivated by the GenIMS study, where it is of interest to model the eve...
详细信息
Joint modeling techniques have become a popular strategy for studying the association between a response and one or more longitudinal covariates. Motivated by the GenIMS study, where it is of interest to model the event of survival using censored longitudinal biomarkers, a joint model is proposed for describing the relationship between a binary outcome and multiple longitudinal covariates subject to detection limits. A fast, approximate em algorithm is developed that reduces the dimension of integration in the E-step of the algorithm to one, regardless of the number of random effects in the joint model. Numerical studies demonstrate that the proposed approximate em algorithm leads to satisfactory parameter and variance estimates in situations with and without censoring on the longitudinal covariates. The approximate em algorithm is applied to analyze the GenIMS data set. (C) 2014 Elsevier B.V. All rights reserved.
Recently, many researchers focused on modeling non-monotonic hazard functions such as bath-tube and hump shapes. However, most of their estimation methods are focused on complete observations. Since reliability data a...
详细信息
Recently, many researchers focused on modeling non-monotonic hazard functions such as bath-tube and hump shapes. However, most of their estimation methods are focused on complete observations. Since reliability data are typically censored and truncated, a general em algorithm is proposed, which can fit any of those complex hazard functions. The proposed em algorithm is analyzed by fitting well-known 4-parameter hazard functions, where its performance is compared by their specific direct methods through extensive Monte Carlo simulations. (c) 2022 Elsevier B.V. All rights reserved.
The mean-shift algorithm, based on ideas proposed by Fukunaga and Hostetler [ 16], is a hill-climbing algorithm on the density defined by a finite mixture or a kernel density estimate. Mean-shift can be used as a nonp...
详细信息
The mean-shift algorithm, based on ideas proposed by Fukunaga and Hostetler [ 16], is a hill-climbing algorithm on the density defined by a finite mixture or a kernel density estimate. Mean-shift can be used as a nonparametric clustering method and has attracted recent attention in computer vision applications such as image segmentation or tracking. We show that, when the kernel is Gaussian, mean-shift is an expectation-maximization ( em) algorithm and, when the kernel is non-Gaussian, mean-shift is a generalized em algorithm. This implies that mean-shift converges from almost any starting point and that, in general, its convergence is of linear order. For Gaussian mean-shift, we show: 1) the rate of linear convergence approaches 0 ( superlinear convergence) for very narrow or very wide kernels, but is often close to 1 ( thus, extremely slow) for intermediate widths and exactly 1 ( sublinear convergence) for widths at which modes merge, 2) the iterates approach the mode along the local principal component of the data points from the inside of the convex hull of the data points, and 3) the convergence domains are nonconvex and can be disconnected and show fractal behavior. We suggest ways of accelerating mean-shift based on the em interpretation.
In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. ...
详细信息
In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from another information source. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm (distinguished from the em algorithm of Dempster et al., 1977) based on the information geometry of positive definite matrices. We will report promising results on bacteria clustering experiments using two marker sequences: 16S and gyrB.
This article aims to put forward a new method to solve the linear quantile regression problems based on em algorithm using a location-scale mixture of the asymmetric Laplace error distribution. A closed form of the es...
详细信息
This article aims to put forward a new method to solve the linear quantile regression problems based on em algorithm using a location-scale mixture of the asymmetric Laplace error distribution. A closed form of the estimator of the unknown parameter vector beta based on em algorithm, is obtained. In addition, some simulations are conducted to illustrate the performance of the proposed method. Simulation results demonstrate that the proposed algorithm performs well. Finally, the classical Engel data is fitted and the Bootstrap confidence intervals for estimators are provided.
In this paper, we consider a new procedure for estimating parameters in the proportional hazards model with doubly censored data. Computing the maximum likelihood estimator with doubly censored data is often nontrivia...
详细信息
In this paper, we consider a new procedure for estimating parameters in the proportional hazards model with doubly censored data. Computing the maximum likelihood estimator with doubly censored data is often nontrivial and requires a certain constraint optimization procedure, which is computationally unstable and sometimes fails to converge. We propose an approximated likelihood and study the maximum approximated likelihood estimator, which is obtained by maximizing the approximated likelihood. In comparison to the maximum likelihood estimator, this new estimator is stable and always converges with an efficient em algorithm we develop. The stability of the new estimator even with moderate sample sizes is amply demonstrated through simulated and real data. For theoretical justification of the approximated likelihood, we show the consistency of the maximum approximated likelihood estimator. (C) 2012 Elsevier B.V. All rights reserved.
暂无评论