检索结果-内蒙古大学图书馆

On regime changes in text data using hidden Markov model of contaminated vMF distribution

DATA MINING AND KNOWLEDGE DISCOVERY 2024年第6期38卷 3563-3589页

作者： Zhang, Yingying Sarkar, Shuchismita Chen, Yuanyuan Zhu, Xuwen Western Michigan Univ Kalamazoo MI 49008 USA Bowling Green Univ Bowling Green OH 43402 USA Univ Alabama Tuscaloosa AL 35487 USA

This paper presents a novel methodology for analyzing temporal directional data with scatter and heavy tails. A hidden Markov model with contaminated von Mises-Fisher emission distribution is developed. The model is implemented using forward and backward selection approach that provides additional flexibility for contaminated as well as non-contaminated data. The utility of the method for finding homogeneous time blocks (regimes) is demonstrated on several experimental settings and two real-life text data sets containing presidential addresses and corporate financial statements respectively.

关键词： Finite mixture model Model-based clustering Hidden Markov model Von-Mises Fisher distribution Regime change em algorithm

来源：评论

学校读者我要写书评

暂无评论

Probabilistic Principal Curves on Riemannian Manifolds

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024年第7期46卷 4843-4849页

作者： Kang, Seungwoo Oh, Hee-Seok Seoul Natl Univ Dept Stat Seoul 08826 South Korea

This paper studies a new curve-fitting approach to data on Riemannian manifolds. We define a principal curve based on a mixture model for observations and unobserved latent variables and propose a new algorithm to est... 详细信息

关键词： Manifolds Probabilistic logic Gaussian distribution Fitting Principal component analysis Wrapping Time series analysis Dimensionality reduction em algorithm principal curve Riemannian manifold symmetric space

来源：评论

学校读者我要写书评

暂无评论

A Probabilistic Quality-Relevant Monitoring Method With Gaussian Mixture Model

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 4790-4801页

作者： Yu, Wanke Zhao, Chunhui Huang, Biao Yang, Hui East China Jiaotong Univ Sch Elect & Automat Engn Nanchang 330013 Peoples R China Univ Alberta Dept Chem & Mat Engn Edmonton AB T6G 2G6 Canada Zhejiang Univ Coll Control Sci & Engn Hangzhou 310027 Peoples R China

Process uncertainty, which is usually caused by various factors, is generally subject to unknown complex distribution. However, many existing monitoring methods are established with a single distribution, and thus they may not accurately reflect the uncertainty within process systems. In this study, a probabilistic quality-relevant monitoring (PQM-GMM) is proposed with the Gaussian mixture model to address the aforementioned issue. Different from conventional monitoring methods, the proposed method measures the process uncertainty using multiple Gaussian distributions, which can be used to approximate any unknown complex distribution. Then, the optimization problem of the proposed PQM-GMM model is solved using the expectation maximization (em) algorithm, which includes an augmented Lagrange multiplier in the M-step for model parameter estimation. Using the obtained results, a quality-relevant monitoring model is established with three statistics. It is noted that the proposed model can also be extended to many existing methods since they share a similar structure. Besides, the detailed information such as initial value selection, missing data problem, computation complexity is discussed. The effectiveness and superiority of the proposed method are tested using a numerical simulation example and a real low-pressure heater application. In comparison with some commonly used quality-relevant methods, the proposed model can be robustly established in the presence of corrupted data, and has a better detection sensitivity for the process anomalies in both process and quality variables. Note to Practitioners-A quality-relevant monitoring method is proposed in this study with Gaussian mixture model (GMM) for detecting the abnormal conditions of industrial processes under harsh environment. Since GMM can be used to approximate any unknown complex distribution, the process uncertainty within the collected data can be meticulously measured using the proposed PQM-GMM model. Besi

关键词： Quality-relevant monitoring multi-source noises em algorithm augmented Lagrange multiplier

来源：评论

学校读者我要写书评

暂无评论

Multimodal information capture based truth inference network in crowdsourcing

引用

EXPERT SYSTemS WITH APPLICATIONS 2025年 273卷

作者： Han, Tao Ding, Xinyi Fang, Yili Zhejiang Gongshang Univ Hangzhou Peoples R China

Truth inference of truth from crowdsourced data presents a formidable challenge that has been widely recognized in the field. Recently, there has been a surge in deep learning and Bayesian methods that rely on task features. However, these methods fail to function effectively in situations where task features are lacking or the relationship between task truth and task features is weak. Traditional data mining methods from crowdsourced triplet data either rely on strong model assumptions with poor data adaptability or use weak assumption models based on worker confusion matrices, neglecting the difficulty differences between tasks. To address this, we propose a novel DS-like model that leverages the strong adaptability of the weak model assumption in the DS model by using a task confusion matrix to describe the impact of task difficulty information. Furthermore, we overcome the data information bottleneck by capturing multimodal information about additional data. Our model exhibits weak coupling characteristics, enabling it to adapt to the features of different data. To tackle the complex issues arising from parameter reduction in our model, we introduce an innovative coordinate ascent algorithm, termed "twice-em." Finally, we substantiate the effectiveness of our proposed approach through a comprehensive series of experiments, highlighting significant improvements in the accurate inference of truth, thereby attesting to the significance of our method.

关键词： Crowdsourcing Truth inference External knowledge em algorithm

来源：评论

学校读者我要写书评

暂无评论

Robust mixture regression via an asymmetric exponential power distribution

引用

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION 2024年第5期53卷 2486-2497页

作者： Jiang, Yunlu Huang, Meilan Wei, Xie Tonghua, Hu Hang, Zou Jinan Univ Coll Econ Dept Stat Guangzhou 510632 Peoples R China Jilin Univ Northeast Asian Res Ctr Changchun Peoples R China Yongzhou Vocat Tech Coll Yongzhou Peoples R China

Finite mixture of linear regression (FMLR) models are an efficient tool to fit the unobserved heterogeneous relationships. The parameter estimation of FMLR models is usually based on the normality assumption, but it is very sensitive to outliers. Meanwhile, the traditional robust methods often need to assume a specific error distribution, and are not adaptive to dataset. In this paper, a robust estimation procedure for FMLR models is proposed by assuming that the error terms follow an asymmetric exponential power distribution, including normal distribution, skew-normal distribution, generalized error distribution, Laplace distribution, asymmetric Laplace distribution, and uniform distribution as special cases. The proposed method can select the suitable loss function from a broad class in a data driven fashion. Under some conditions, the asymptotic properties of proposed method are established. In addition, an efficient em algorithm is introduced to implement the proposed methodology. The finite sample performance of the proposed approach is illustrated via some numerical simulations. Finally, we apply the proposed methodology to analyze a tone perception data.

关键词： AEP density function em algorithm Finite mixture of linear regression models

来源：评论

学校读者我要写书评

暂无评论

DS-TFSN-Based Vehicle Travel Time Prediction Method for Digital Twin System of Freeways

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTemS 2024年第12期25卷 20073-20084页

作者： Zhang, Weibin Zha, Huazhu Gan, Lu Li, Qianmu Nanjing Univ Sci & Technol Sch Elect & Opt Engn Nanjing 210094 Peoples R China Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China

In digital twin systems for freeways, it is essential to track individual vehicles. When sensing devices cannot fully cover an entire road, it is necessary to accurately predict the travel time of individual vehicles. Therefore, this paper proposes a dualstate traffic factor state network (DS-TFSN), which combines macro traffic states and micro vehicle travel states. Based on the DS-TFSN, a digital twin framework is proposed for freeways. This framework can realize long-distance freeway supervision and vehicle tracking by predicting the travel time of specific vehicles in unsupervised road sections to ascertain their driving process. As the core of digital twin frameworks of freeways, the freeway section travel time prediction model based on the DS-TFSN considers the interactions among macro factors, micro factors, and environmental factors. The model divides the macro traffic state and micro vehicle travel state, and adds them as inputs to the LSTM model. A new vehicle-specific deep learning method is proposed to improve the prediction accuracy in terms of the freeway section travel time. The results show that, for freeways, more accurate prediction results are achieved during both normal hours and holidays. The MAPE of the prediction results using the dual-state traffic factor state network decreases by 6.2%, at most, and the proportion of vehicles with a prediction error of less than 1 second per kilometer increases by 54%, at most.

关键词： Dual-state traffic factor state network digital twin travel time prediction LSTM em algorithm

来源：评论

学校读者我要写书评

暂无评论

Composite likelihood methods for parsimonious model-based clustering of mixed-type data

引用

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION 2024年第2期18卷 381-407页

作者： Ranalli, Monia Rocci, Roberto Sapienza Univ Rome Piazzale Aldo Moro 5 Rome Italy

In this paper, we propose twelve parsimonious models for clustering mixed-type (ordinal and continuous) data. The dependence among the different types of variables is modeled by assuming that ordinal and continuous data follow a multivariate finite mixture of Gaussians, where the ordinal variables are a discretization of some continuous variates of the mixture. The general class of parsimonious models is based on a factor decomposition of the component-specific covariance matrices. Parameter estimation is carried out using a em-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data.

关键词： Mixture models Factor analyzers Composite Likelihood em algorithm Mixed-type data

来源：评论

学校读者我要写书评

暂无评论

An extended exponential hyper-Poisson distribution: Properties and applications

引用

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS 2024年第12期53卷 4311-4333页

作者： Kumar, C. Satheesh Satheenthar, A. S. Univ Kerala Dept Stat Thiruvananthapuram India

Here we propose a new class of probability distributions as an extended version of the exponential hyper-Poisson distribution and Weibull Poisson distribution. We investigate several important aspects of the distribution through deriving expressions for its probability density function (pdf), cumulative distribution function, survival function, failure rate function, pdf of the order statistics, r-th raw moments, etc. The method of maximum likelihood estimation procedures along with em algorithm is discussed for estimating the parameters of the distribution and a test procedure is suggested for testing the significance of the additional parameters of the proposed model. The use of the proposed distribution is illustrated through real-life data sets. Further, a brief simulation study is carried out for evaluating the performance of the estimators obtained for the parameters of the distribution.

关键词： Compounding em algorithm maximum likelihood estimation positive hyper-poisson distribution simulation Weibull distribution

来源：评论

学校读者我要写书评

暂无评论

Recursive Least Squares With Minimax Concave Penalty Regularization for Adaptive System Identification

引用

IEEE ACCESS 2024年 12卷 66993-67004页

作者： Li, Bowen Wu, Suya Tripp, Erin E. Pezeshki, Ali Tarokh, Vahid Colorado State Univ Dept Elect & Comp Engn Ft Collins CO 80523 USA Duke Univ Dept Elect & Comp Engn Durham NC 27708 USA USAF Res Lab Rome NY 13441 USA

We develop a recursive least squares (RLS) type algorithm with a minimax concave penalty (MCP) for adaptive identification of a sparse tap-weight vector that represents a communication channel. The proposed algorithm recursively yields its estimate of the tap-vector, from noisy streaming observations of a received signal, using expectation-maximization (em) update. We prove the convergence to a local optimum of the static least squares version of our algorithm and provide bounds for the estimation error. We study the performance of the recursive version numerically. Using simulation studies of Rayleigh fading channel, Volterra system and multivariate time series model, we demonstrate that our recursive algorithm outperforms, in the mean-squared error (MSE) sense, the standard RLS and the l(1) -regularized RLS.

关键词： Adaptive filtering em algorithm minimax concave penalty (MCP) sparse system identification

来源：评论

学校读者我要写书评

暂无评论

Post-selection inference in regression models for group testing data

引用

BIOMETRICS 2024年第3期80卷 ujae101页

作者： Shen, Qinyan Gregory, Karl Huang, Xianzheng Univ South Carolina Dept Stat 219 LeConte1523 Greene St Columbia SC 29208 USA

We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.

关键词： confidence intervals em algorithm individual testing LASSO variable selection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：