This paper presents a novel methodology for analyzing temporal directional data with scatter and heavy tails. A hidden Markov model with contaminated von Mises-Fisher emission distribution is developed. The model is i...
详细信息
This paper presents a novel methodology for analyzing temporal directional data with scatter and heavy tails. A hidden Markov model with contaminated von Mises-Fisher emission distribution is developed. The model is implemented using forward and backward selection approach that provides additional flexibility for contaminated as well as non-contaminated data. The utility of the method for finding homogeneous time blocks (regimes) is demonstrated on several experimental settings and two real-life text data sets containing presidential addresses and corporate financial statements respectively.
This paper studies a new curve-fitting approach to data on Riemannian manifolds. We define a principal curve based on a mixture model for observations and unobserved latent variables and propose a new algorithm to est...
详细信息
This paper studies a new curve-fitting approach to data on Riemannian manifolds. We define a principal curve based on a mixture model for observations and unobserved latent variables and propose a new algorithm to estimate the principal curve for given data points on Riemannian manifolds.
Process uncertainty, which is usually caused by various factors, is generally subject to unknown complex distribution. However, many existing monitoring methods are established with a single distribution, and thus the...
详细信息
Process uncertainty, which is usually caused by various factors, is generally subject to unknown complex distribution. However, many existing monitoring methods are established with a single distribution, and thus they may not accurately reflect the uncertainty within process systems. In this study, a probabilistic quality-relevant monitoring (PQM-GMM) is proposed with the Gaussian mixture model to address the aforementioned issue. Different from conventional monitoring methods, the proposed method measures the process uncertainty using multiple Gaussian distributions, which can be used to approximate any unknown complex distribution. Then, the optimization problem of the proposed PQM-GMM model is solved using the expectation maximization (em) algorithm, which includes an augmented Lagrange multiplier in the M-step for model parameter estimation. Using the obtained results, a quality-relevant monitoring model is established with three statistics. It is noted that the proposed model can also be extended to many existing methods since they share a similar structure. Besides, the detailed information such as initial value selection, missing data problem, computation complexity is discussed. The effectiveness and superiority of the proposed method are tested using a numerical simulation example and a real low-pressure heater application. In comparison with some commonly used quality-relevant methods, the proposed model can be robustly established in the presence of corrupted data, and has a better detection sensitivity for the process anomalies in both process and quality variables. Note to Practitioners-A quality-relevant monitoring method is proposed in this study with Gaussian mixture model (GMM) for detecting the abnormal conditions of industrial processes under harsh environment. Since GMM can be used to approximate any unknown complex distribution, the process uncertainty within the collected data can be meticulously measured using the proposed PQM-GMM model. Besi
Truth inference of truth from crowdsourced data presents a formidable challenge that has been widely recognized in the field. Recently, there has been a surge in deep learning and Bayesian methods that rely on task fe...
详细信息
Truth inference of truth from crowdsourced data presents a formidable challenge that has been widely recognized in the field. Recently, there has been a surge in deep learning and Bayesian methods that rely on task features. However, these methods fail to function effectively in situations where task features are lacking or the relationship between task truth and task features is weak. Traditional data mining methods from crowdsourced triplet data either rely on strong model assumptions with poor data adaptability or use weak assumption models based on worker confusion matrices, neglecting the difficulty differences between tasks. To address this, we propose a novel DS-like model that leverages the strong adaptability of the weak model assumption in the DS model by using a task confusion matrix to describe the impact of task difficulty information. Furthermore, we overcome the data information bottleneck by capturing multimodal information about additional data. Our model exhibits weak coupling characteristics, enabling it to adapt to the features of different data. To tackle the complex issues arising from parameter reduction in our model, we introduce an innovative coordinate ascent algorithm, termed "twice-em." Finally, we substantiate the effectiveness of our proposed approach through a comprehensive series of experiments, highlighting significant improvements in the accurate inference of truth, thereby attesting to the significance of our method.
Finite mixture of linear regression (FMLR) models are an efficient tool to fit the unobserved heterogeneous relationships. The parameter estimation of FMLR models is usually based on the normality assumption, but it i...
详细信息
Finite mixture of linear regression (FMLR) models are an efficient tool to fit the unobserved heterogeneous relationships. The parameter estimation of FMLR models is usually based on the normality assumption, but it is very sensitive to outliers. Meanwhile, the traditional robust methods often need to assume a specific error distribution, and are not adaptive to dataset. In this paper, a robust estimation procedure for FMLR models is proposed by assuming that the error terms follow an asymmetric exponential power distribution, including normal distribution, skew-normal distribution, generalized error distribution, Laplace distribution, asymmetric Laplace distribution, and uniform distribution as special cases. The proposed method can select the suitable loss function from a broad class in a data driven fashion. Under some conditions, the asymptotic properties of proposed method are established. In addition, an efficient em algorithm is introduced to implement the proposed methodology. The finite sample performance of the proposed approach is illustrated via some numerical simulations. Finally, we apply the proposed methodology to analyze a tone perception data.
In digital twin systems for freeways, it is essential to track individual vehicles. When sensing devices cannot fully cover an entire road, it is necessary to accurately predict the travel time of individual vehicles....
详细信息
In digital twin systems for freeways, it is essential to track individual vehicles. When sensing devices cannot fully cover an entire road, it is necessary to accurately predict the travel time of individual vehicles. Therefore, this paper proposes a dualstate traffic factor state network (DS-TFSN), which combines macro traffic states and micro vehicle travel states. Based on the DS-TFSN, a digital twin framework is proposed for freeways. This framework can realize long-distance freeway supervision and vehicle tracking by predicting the travel time of specific vehicles in unsupervised road sections to ascertain their driving process. As the core of digital twin frameworks of freeways, the freeway section travel time prediction model based on the DS-TFSN considers the interactions among macro factors, micro factors, and environmental factors. The model divides the macro traffic state and micro vehicle travel state, and adds them as inputs to the LSTM model. A new vehicle-specific deep learning method is proposed to improve the prediction accuracy in terms of the freeway section travel time. The results show that, for freeways, more accurate prediction results are achieved during both normal hours and holidays. The MAPE of the prediction results using the dual-state traffic factor state network decreases by 6.2%, at most, and the proportion of vehicles with a prediction error of less than 1 second per kilometer increases by 54%, at most.
In this paper, we propose twelve parsimonious models for clustering mixed-type (ordinal and continuous) data. The dependence among the different types of variables is modeled by assuming that ordinal and continuous da...
详细信息
In this paper, we propose twelve parsimonious models for clustering mixed-type (ordinal and continuous) data. The dependence among the different types of variables is modeled by assuming that ordinal and continuous data follow a multivariate finite mixture of Gaussians, where the ordinal variables are a discretization of some continuous variates of the mixture. The general class of parsimonious models is based on a factor decomposition of the component-specific covariance matrices. Parameter estimation is carried out using a em-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data.
Here we propose a new class of probability distributions as an extended version of the exponential hyper-Poisson distribution and Weibull Poisson distribution. We investigate several important aspects of the distribut...
详细信息
Here we propose a new class of probability distributions as an extended version of the exponential hyper-Poisson distribution and Weibull Poisson distribution. We investigate several important aspects of the distribution through deriving expressions for its probability density function (pdf), cumulative distribution function, survival function, failure rate function, pdf of the order statistics, r-th raw moments, etc. The method of maximum likelihood estimation procedures along with em algorithm is discussed for estimating the parameters of the distribution and a test procedure is suggested for testing the significance of the additional parameters of the proposed model. The use of the proposed distribution is illustrated through real-life data sets. Further, a brief simulation study is carried out for evaluating the performance of the estimators obtained for the parameters of the distribution.
We develop a recursive least squares (RLS) type algorithm with a minimax concave penalty (MCP) for adaptive identification of a sparse tap-weight vector that represents a communication channel. The proposed algorithm ...
详细信息
We develop a recursive least squares (RLS) type algorithm with a minimax concave penalty (MCP) for adaptive identification of a sparse tap-weight vector that represents a communication channel. The proposed algorithm recursively yields its estimate of the tap-vector, from noisy streaming observations of a received signal, using expectation-maximization (em) update. We prove the convergence to a local optimum of the static least squares version of our algorithm and provide bounds for the estimation error. We study the performance of the recursive version numerically. Using simulation studies of Rayleigh fading channel, Volterra system and multivariate time series model, we demonstrate that our recursive algorithm outperforms, in the mean-squared error (MSE) sense, the standard RLS and the l(1) -regularized RLS.
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the t...
详细信息
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.
暂无评论