检索结果-内蒙古大学图书馆

Deep neural network and model-based clustering technique for forensic electronic mail author attribution

SN APPLIED SCIENCES 2021年第3期3卷 348页

作者： Apoorva, K. A. Sangeetha, S. Natl Inst Technol Dept Comp Applicat Tiruchirappalli Tamil Nadu India

Electronic mail is the primary source of different cyber scams. Identifying the author of electronic mail is essential. It forms significant documentary evidence in the field of digital forensics. This paper presents a model for email author identification (or) attribution by utilizing deep neural networks and model-based clustering techniques. It is perceived that stylometry features in the authorship identification have gained a lot of importance as it enhances the author attribution task's accuracy. The experiments were performed on a publicly available benchmark Enron dataset, considering many authors. The proposed model achieves an accuracy of 94% on five authors, 90% on ten authors, 86% on 25 authors and 75% on the entire dataset for the Deep Neural Network technique, which is a good measure of accuracy on a highly imbalanced data. The second cluster-based technique yielded an excellent 86% accuracy on the entire dataset, considering the authors' number based on their contribution to the aggregate data.

关键词： Deep neural networks model-based clustering Enron Author attribution Digital forensics

来源：评论

学校读者我要写书评

暂无评论

An Evolutionary Algorithm with Crossover and Mutation for model-based clustering

引用

JOURNAL OF CLASSIFICATION 2021年第2期38卷 264-279页

作者： McNicholas, Sharon M. McNicholas, Paul D. Ashlock, Daniel A. McMaster Univ Dept Math & Stat Hamilton ON L8S 4L8 Canada Univ Guelph Dept Math & Stat Guelph ON N1G 2W1 Canada

An evolutionary algorithm (EA) is developed as an alternative to the EM algorithm for parameter estimation in model-based clustering. This EA facilitates a different search of the fitness landscape, i.e., the likelihood surface, utilizing both crossover and mutation. Furthermore, this EA represents an efficient approach to "hard" model-based clustering and so it can be viewed as a sort of generalization of thek-means algorithm, which is itself equivalent to a restricted Gaussian mixture model. The EA is illustrated on several datasets, and its performance is compared with that of other hard clustering approaches and model-based clustering via the EM algorithm.

关键词： clustering Crossover Evolutionary algorithm Mixture models Mutation model-based clustering

来源：评论

学校读者我要写书评

暂无评论

Penalized model-based clustering of fMRI data

引用

BIOSTATISTICS 2022年第3期23卷 825-843页

作者： Dilernia, Andrew Quevedo, Karina Camchong, Jazmin Lim, Kelvin Pan, Wei Zhang, Lin Univ Minnesota Div Biostat Minneapolis MN 55455 USA Univ Minnesota Dept Psychiat Minneapolis MN 55455 USA

Functional magnetic resonance imaging (fMRI) data have become increasingly available and are useful for describing functional connectivity (FC), the relatedness of neuronal activity in regions of the brain. This FC of the brain provides insight into certain neurodegenerative diseases and psychiatric disorders, and thus is of clinical importance. To help inform physicians regarding patient diagnoses, unsupervised clustering of subjects based on FC is desired, allowing the data to inform us of groupings of patients based on shared features of connectivity. Since heterogeneity in FC is present even between patients within the same group, it is important to allow subject-level differences in connectivity, while still pooling information across patients within each group to describe group-level FC. To this end, we propose a random covariance clustering model (RCCM) to concurrently cluster subjects based on their FC networks, estimate the unique FC networks of each subject, and to infer shared network features. Although current methods exist for estimating FC or clustering subjects using fMRI data, our novel contribution is to cluster or group subjects based on similar FC of the brain while simultaneously providing group- and subject-level FC network estimates. The competitive performance of RCCM relative to other methods is demonstrated through simulations in various settings, achieving both improved clustering of subjects and estimation of FC networks. Utility of the proposed method is demonstrated with application to a resting-state fMRI data set collected on 43 healthy controls and 61 participants diagnosed with schizophrenia.

关键词： Brain connectivity fMRI Gaussian graphical models Machine learning model-based clustering Neuroimaging Schizophrenia

来源：评论

学校读者我要写书评

暂无评论

A bootstrap-based aggregate classifier for model-based clustering

引用

COMPUTATIONAL STATISTICS 2008年第4期23卷 643-659页

作者： Dias, Jose G. Vermunt, Jeroen K. ISCTE Dept Quantitat Methods Higher Inst Social Sci & Business Studies P-1649026 Lisbon Portugal ISCTE UNIDE P-1649026 Lisbon Portugal Tilburg Univ Dept Methodol & Stat NL-5000 LE Tilburg Netherlands

In model-based clustering, a situation in which true class labels are unknown and that is therefore also referred to as unsupervised learning, observations are typically classified by the Bayes modal rule. In this study, we assess whether alternative classifiers from the classification or supervised-learning literature-developed for situations in which class labels are known-can improve the Bayes rule. More specifically, we investigate the performance of bootstrap-based aggregate (bagging) rules after adapting these to the model-based clustering context. It is argued that specific issues, such as the label-switching problem, have to be carefully addressed when using bootstrap methods in model-based clustering. Our two Monte Carlo studies show that classification based on the Bayes rule is rather stable and difficult to improve by bootstrap-based aggregate rules, even for sparse data. An empirical example illustrates the various approaches described in this paper.

关键词： classification rules classifiers bootstrap estimation model-based clustering latent class model

来源：评论

学校读者我要写书评

暂无评论

Anderson relaxation test for intrinsic dimension selection in model-based clustering

引用

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2022年第16期92卷 3468-3487页

作者： Kim, Nam-Hwui Browne, Ryan P. Univ Waterloo Dept Stat & Actuarial Sci Waterloo ON Canada

Parsimonious finite mixture models often require the a priori selection of desired model dimensionality. For example, projection-based parsimonious models demand the dimension of the subspace for projection. Other models ask for their own structural restrictions on parameters. The subspace clustering framework is a projection-based parsimonious model for various finite mixtures, including the Gaussian variant. The existing dimension selection methods for subspace clustering are ad-hoc or potentially computationally prohibitive, creating a need for a principled, yet computationally lightweight, approach. In light of this problem, a hypothesis test-based intrinsic dimension estimation method called the Anderson Relaxation Test (ART) is introduced, and its performance is examined in both simulated and real data settings.

关键词： model-based clustering dimension reduction intrinsic dimension

来源：评论

学校读者我要写书评

暂无评论

Forecasting Simultaneously High-Dimensional Time Series: A Robust model-based clustering Approach

引用

JOURNAL OF FORECASTING 2013年第8期32卷 673-684页

作者： Wang, Yongning Tsay, Ruey S. Ledolter, Johannes Shrestha, Keshab M. Univ Chicago Booth Sch Business Chicago IL 60637 USA Univ Iowa Dept Management Sci & Stat & Actuarial Sci Iowa City IA 52242 USA Natl Univ Singapore Risk Management Inst Singapore 117548 Singapore

This paper considers the problem of forecasting high-dimensional time series. It employs a robust clustering approach to perform classification of the component series. Each series within a cluster is assumed to follow the same model and the data are then pooled for estimation. The classification is model-based and robust to outlier contamination. The robustness is achieved by using the intrinsic mode functions of the Hilbert-Huang transform at lower frequencies. These functions are found to be robust to outlier contamination. The paper also compares out-of-sample forecast performance of the proposed method with several methods available in the literature. The other forecasting methods considered include vector autoregressive models with/without LASSO, group LASSO, principal component regression, and partial least squares. The proposed method is found to perform well in out-of-sample forecasting of the monthly unemployment rates of 50 US states. Copyright (c) 2013 John Wiley & Sons, Ltd.

关键词： Hilbert-Huang transform LASSO regression Markov chain Monte Carlo model-based clustering partial least squares principal component regression

来源：评论

学校读者我要写书评

暂无评论

Bayesian estimation of membership uncertainty in model-based clustering

引用

JOURNAL OF CHEMOMETRICS 2014年第5期28卷 358-369页

作者： Chen, Liyuan Brown, Steven D. Univ Delaware Dept Chem & Biochem Brown Lab Newark DE 19716 USA

We report the use of a cluster analysis method based on a multivariate mixture model, known as model-based clustering, for overcoming the limitations of hierarchical clustering and relocation clustering. Unlike traditional clustering methods in which clusters are formed on the basis of intercluster distances, model-based clustering classifies observations on the basis of probability estimated from Gaussian mixture modeling, and its statistical basis allows for inference. Three examples are given in which we demonstrate that model-based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering in the presence of outliers than agglomerative hierarchical clustering or iterative relocation clustering using a K-means criterion. We also show that Markov chain Monte Carlo simulation, as implemented via Gibbs sampling coupled with model-based clustering, may be used to assess uncertainty of group memberships. Copyright (c) 2013 John Wiley & Sons, Ltd. We illustrate the use of model-based clustering and show three examples in which model-based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering than other clustering methods. We also show that Markov chain Monte Carlo simulation, as implemented via Gibbs sampling coupled with model-based clustering, may be used to assess uncertainty of cluster group memberships.

关键词： model-based clustering Markov chain Monte Carlo simulation Gibbs sampling

来源：评论

学校读者我要写书评

暂无评论

Group-Wise Shrinkage Estimation in Penalized model-based clustering

引用

JOURNAL OF CLASSIFICATION 2022年第3期39卷 648-674页

作者： Casa, Alessandro Cappozzo, Andrea Fop, Michael Free Univ Bozen Bolzano Fac Econ & Management Piazza Univ 1 I-39100 Bolzano Italy Politecn Milan MOX Lab Modeling & Sci Comp Milan Italy Univ Coll Dublin Sch Math & Stat Dublin Ireland

Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.

关键词： model-based clustering Penalized likelihood Sparse precision matrices Gaussian graphical models Graphical lasso EM algorithm

来源：评论

学校读者我要写书评

暂无评论

Variable selection for model-based clustering using the integrated complete-data likelihood

引用

STATISTICS AND COMPUTING 2017年第4期27卷 1049-1063页

作者： Marbac, Matthieu Sedki, Mohammed McMaster Univ Dept Math & Stat Hamilton ON Canada INSERM U1181 Orsay France Univ Paris 11 Orsay France

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often computationally expensive because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require the maximum likelihood estimate and its maximization appears to be simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumed. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection. The proposed approach is implemented in the R package VarSelLCM available on CRAN.

关键词： Gaussian mixture model Information criterion Integrated complete-data likelihood model-based clustering Variable selection

来源：评论

学校读者我要写书评

暂无评论

Improved model-based clustering performance using Bayesian initialization averaging

引用

COMPUTATIONAL STATISTICS 2019年第1期34卷 201-231页

作者： O'Hagan, Adrian White, Arthur Univ Coll Dublin Sch Math & Stat Dublin Ireland Univ Coll Dublin Insight Ctr Data Analyt Dublin Ireland Univ Dublin Trinity Coll Dublin Sch Comp Sci & Stat Dublin 2 Ireland

The expectation-maximization (EM) algorithm is a commonly used method for finding the maximum likelihood estimates of the parameters in a mixture model via coordinate ascent. A serious pitfall with the algorithm is that in the case of multimodal likelihood functions, it can get trapped at a local maximum. This problem often occurs when sub-optimal starting values are used to initialize the algorithm. Bayesian initialization averaging (BIA) is proposed as an ensemble method to generate high quality starting values for the EM algorithm. Competing sets of trial starting values are combined as a weighted average, which is then used as the starting position for a full EM run. The method can also be extended to variational Bayes methods, a class of algorithm similar to EM that is based on an approximation of the model posterior. The BIA method is demonstrated on real continuous, categorical and network data sets, and the convergent log-likelihoods and associated clustering solutions presented. These compare favorably with the output produced using competing initialization methods such as random starts, hierarchical clustering and deterministic annealing, with the highest available maximum likelihood estimates obtained in a higher percentage of cases, at reasonable computational cost. For the Stochastic Block model for network data promising results are demonstrated even when the likelihood is unavailable. The implications of the different clustering solutions obtained by local maxima are also discussed.

关键词： Bayesian model averaging Expectation-maximization algorithm Finite mixture models Hierarchical clustering model-based clustering Multimodal likelihood

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：