检索结果-内蒙古大学图书馆

Entropy-Based Anomaly Detection for Gaussian Mixture Modeling

algorithmS 2023年第4期16卷 195-195页

作者： Scrucca, Luca Univ Perugia Dept Econ Via A Pascoli 20 I-06123 Perugia Italy

Gaussian mixture modeling is a generative probabilistic model that assumes that the observed data are generated from a mixture of multiple Gaussian distributions. This mixture model provides a flexible approach to model complex distributions that may not be easily represented by a single Gaussian distribution. The Gaussian mixture model with a noise component refers to a finite mixture that includes an additional noise component to model the background noise or outliers in the data. This additional noise component helps to take into account the presence of anomalies or outliers in the data. This latter aspect is crucial for anomaly detection in situations where a clear, early warning of an abnormal condition is required. This paper proposes a novel entropy-based procedure for initializing the noise component in Gaussian mixture models. Our approach is shown to be easy to implement and effective for anomaly detection. We successfully identify anomalies in both simulated and real-world datasets, even in the presence of significant levels of noise and outliers. We provide a step-by-step description of the proposed data analysis process, along with the corresponding R code, which is publicly available in a GitHub repository.

关键词： Gaussian mixture modeling cluster analysis noise component outliers entropy of Gaussian mixtures em algorithm

来源：评论

学校读者我要写书评

暂无评论

Matrix-variate data analysis by two-way factor model with replicated observations

引用

STATISTICS & PROBABILITY LETTERS 2023年第1期202卷

作者： Li, Yan Gao, Zhigen Huang, Wei Guo, Jianhua Northeast Normal Univ Sch Math & Stat Changchun 130024 Jilin Peoples R China Northeast Normal Univ Acad Adv Interdisciplinary Studies Changchun 130024 Jilin Peoples R China Beijing Technol & Business Univ Sch Math & Stat Beijing 100048 Peoples R China

Motivated by recent work on matrix-variate data analysis in various scientific domains, we propose a two-way factor model (2wFMs) to capture the separable effects of row and column attributes. This paper studies the identification conditions of 2wFMs and develops a block alternative optimization algorithm for maximum likelihood estimation (MLE). The asymptotic theories for the maximum likelihood estimators are established. Monte Carlo simulations show that the method we propose is effective and robust. & COPY;2023 Elsevier B.V. All rights reserved.

关键词： em algorithm Large sample properties Matrix-variate data Maximum likelihood estimation Two-way factor models

来源：评论

学校读者我要写书评

暂无评论

A study on the influence of the spread of Yangming Studies in Japan on the psychology of the Japanese people based on big data analysis

引用

APPLIED MATHemATICS AND NONLINEAR SCIENCES 2023年第1期9卷

作者： Liu, Hongyan Suqian Univ Sch Foreign Studies Suqian 223800 Jiangsu Peoples R China

The analysis of the psychological impact of the spread of Yangming studies in Japan on the Japanese people is to enable Yangming studies to be better developed in Japan. Based on big data analysis technology, this paper constructs a hybrid data analysis model using the em algorithm and proposes performance evaluation indexes for the model. Under the em data analysis model constructed in this paper, the example indicators of the Japanese people's psychological impact in disseminating Yangming studies by big data analysis are explored, i.e., the psychological acceptability of the dissemination method and the psychological and moral construction impact. Regarding the dissemination method, the Japanese people are more receptive to disseminating Yangming studies in Japan through "learning rules", with an average percentage of 39.37%. Regarding psychological and moral construction, 90.22% of the Japanese people believe that disseminating Yangming studies can promote self-improvement of value standards and correct self-examination. Based on the big data analysis, we can effectively see from the data the impact of Yangming studies on the audience in the process of dissemination, and improve the scope of Yangming studies dissemination according to the data feedback, so that more people can recognize the idea of unity of knowledge and action.

关键词： Big data analysis em algorithm Yangming studies Knowledge and action Japanese people

来源：评论

学校读者我要写书评

暂无评论

Efficient estimation for the proportional hazards model with left-truncated and interval-censored data

引用

STAT 2023年第1期12卷

作者： Lu, Tianyi Li, Hongxi Li, Shuwei Sun, Liuquan Guangzhou Univ Sch Econ & Stat Guangzhou Peoples R China Chinese Acad Sci Inst Appl Math Acad Math & Syst Sci Beijing Peoples R China Guangzhou Univ Daxuecheng Rd 230 Guangzhou 510006 Peoples R China

Interval-censored data often arise in prospective studies involving periodical follow-up for monitoring the failure event occurrence. In addition to censoring, left truncation also occurs if only participants who have not experienced the failure event are enrolled in the study, which clearly induces the selection bias and makes the analysis more complicated. This work provides an efficient maximum likelihood estimation approach that appropriately adjusts the biased sampling for the proportional hazards model with left-truncated and interval-censored data. A flexible and stable expectation-maximisation algorithm via a two-stage data augmentation is developed to maximise the intractable likelihood function. The asymptotic properties of the proposed estimators are established with the empirical process theory. The numerical results obtained from extensive simulations suggest that the proposed method performs satisfactorily and has some prominent advantages over the competing methods. An application to a colon cancer dataset also demonstrates the usefulness of the proposed method.

关键词： Cox model em algorithm interval censoring left truncation nonparametric maximum likelihood estimation

来源：评论

学校读者我要写书评

暂无评论

Regression Models for Understanding COVID-19 Epidemic Dynamics With Incomplete Data

引用

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2021年第536期116卷 1561-1577页

作者： Quick, Corbin Dey, Rounak Lin, Xihong Harvard TH Chan Sch Publ Hlth Dept Biostat Boston MA USA Harvard Univ Fac Arts & Sci Dept Stat Cambridge MA 02138 USA

Modeling infectious disease dynamics has been critical throughout the COVID-19 pandemic. Of particular interest are the incidence, prevalence, and effective reproductive number (R-t). Estimating these quantities is challenging due to under-ascertainment, unreliable reporting, and time lags between infection, onset, and testing. We propose a Multilevel Epidemic Regression Model to Account for Incomplete Data (MERMAID) to jointly estimate R-t, ascertainment rates, incidence, and prevalence over time in one or multiple regions. Specifically, MERMAID allows for a flexible regression model of R-t that can incorporate geographic and time-varying covariates. To account for under-ascertainment, we (a) model the ascertainment probability over time as a function of testing metrics and (b) jointly model data on confirmed infections and population-based serological surveys. To account for delays between infection, onset, and reporting, we model stochastic lag times as missing data, and develop an em algorithm to estimate the model parameters. We evaluate the performance of MERMAID in simulation studies, and assess its robustness by conducting sensitivity analyses in a range of scenarios of model misspecifications. We apply the proposed method to analyze COVID-19 daily confirmed infection counts, PCR testing data, and serological survey data across the United States. Based on our model, we estimate an overall COVID-19 prevalence of 12.5% (ranging from 2.4% in Maine to 20.2% in New York) and an overall ascertainment rate of 45.5% (ranging from 22.5% in New York to 81.3% in Rhode Island) in the United States from March to December 2020. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

关键词： COVID-19 transmission Effective reproductive number em algorithm Epidemic model Missing data Prevalence Serological studies Under-ascertainment

来源：评论

学校读者我要写书评

暂无评论

Multivariate Poisson model adjusting for unidirectional covariate misrepresentation

引用

STATISTICS & PROBABILITY LETTERS 2023年 197卷

作者： Zhang, Pengcheng Wu, Xueyuan Shandong Univ Finance & Econ Sch Insurance Jinan 250014 Peoples R China Univ Melbourne Ctr Actuarial Studies Dept Econ Melbourne Vic 3010 Australia

This paper considers the misrepresentation problem in a multivariate Poisson model. As for inference, we develop an expectation-maximization (em) algorithm. A simulation study is carried out to validate our algorithm.... 详细信息

关键词： Multivariate Poisson model Misrepresentation em algorithm

来源：评论

学校读者我要写书评

暂无评论

Estimation Under Mode Effects and Proxy Surveys, Accounting for Non-ignorable Nonresponse

引用

SANKHYA-SERIES A-MATHemATICAL STATISTICS AND PROBABILITY 2021年第2期83卷 779-813页

作者： Pfeffermann, Danny Preminger, Arie Cent Bur Stat Jerusalem Israel Hebrew Univ Jerusalem Dept Stat Jerusalem Israel Univ Southampton Southampton Stat Sci Res Inst Southampton Hants England

We propose a new, model-based methodology to address two major problems in survey sampling: The first problem is known as mode effects, under which responses of sampled units possibly depend on the mode of response, whether by internet, telephone, personal interview, etc. The second problem is of proxy surveys, whereby sampled units respond not only about themselves but also for other sampled. For example, in many familiar household surveys, one member of the household provides information for all other members, possibly with measurement errors. Ignoring the existence of mode effects and/or possible measurement errors in proxy surveys could result in possible bias in point estimators and subsequent inference. Our approach accounts also for nonignorable nonresponse. We illustrate the proposed methodology by use of simulation experiments and real sample data, with known true population values.

关键词： em algorithm measurement effects NMAR nonresponse probability and nonprobability sampling selection effects

来源：评论

学校读者我要写书评

暂无评论

A model-based clustering algorithm with covariates adjustment and its application to lung cancer stratification

引用

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2023年第4期21卷 2350019-2350019页

作者： Relvas, Carlos E. M. Nakata, Asuka Chen, Guoan Beer, David G. Gotoh, Noriko Fujita, Andre Univ Sao Paulo Inst Math & Stat Rua Matao 1010 BR-05508090 Sao Paulo SP Brazil Kanazawa Univ Canc Res Inst Kanazawa Ishikawa 9201164 Japan Southern Univ Sci & Technol Sch Med 1088 Xueyuan Blvd Shenzhen 518055 Guangdong Peoples R China Univ Michigan Rogel Canc Ctr 1500 E Med Ctr Dr Ann Arbor MI 48109 USA

Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical data is the presence of covariates, which may mask the obtained clustering structure. For example, suppose we are interested in clustering a set of individuals into controls and cancer patients. A clustering algorithm could group subjects into young and elderly in this case. It may happen because the age at diagnosis is associated with cancer. Thus, we developed Cem-Co, a model-based clustering algorithm that removes/minimizes undesirable covariates' effects during the clustering process. We applied Cem-Co on a gene expression dataset composed of 129 stage I non-small cell lung cancer patients. As a result, we identified a subgroup with a poorer prognosis, while standard clustering algorithms failed.

关键词： Mixture Gaussian models em algorithm clustering lung cancer

来源：评论

学校读者我要写书评

暂无评论

Unsupervised statistical image segmentation using bi-dimensional hidden Markov chains model with application to mammography images

引用

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES 2023年第9期35卷

作者： Joumad, Abdelali El Moutaouakkil, Abdelmajid Nasroallah, Abdelaziz Boutkhoum, Omar Rustam, Furqan Ashraf, Imran Chouaib Dokkali Univ Fac Sci Dept Informat BP 29924000 El Jadida Morocco Cadi Ayyad Univ Fac Sci Semlalia Dept Math BP 2390 Marrakech Morocco Univ Coll Dublin Sch Comp Sci Dublin D04 V1W8 Ireland Yeungnam Univ Informat & Commun Engn Gyongsan 38541 South Korea

Hidden Markov chain (HMC) models have been widely used in unsupervised image segmentation. In these models, there is a double process;a hidden one noted X and an observed one, which is often one-dimensional, noted Y. The latter is constituted by pixels of a noisy image after transforming its bi-dimensional form into a monodimensional sequence. In this context, these models run into a problem of relationships between pixels which is often solved by applying curves such as the Hilbert-Peano scan when modeling the image under study. We propose enriching the HMC model by introducing a second component to the observed process Y based on the average of two observations which are neighbors in the image but are not in the chain of each considered pixel. This gives a bi-dimensional HMC model which has the same structure as the classical model except for the two-dimensional case of the low mod-eling noise. The estimation of the parameters of this model is carried out by using a three-algorithm approach: Bayesian one based mainly on the Markov Chain Monte Carlo (MCMC) methods, Expectation-Maximization (em), and Iterative Conditional Estimation (ICE). We apply the final Bayesian decision criteria Marginal Posterior Mode to come up with a final configuration of the result X. The proposed model is compared to the classical HMC model in combination with the Hilbert-Peano scan numerically through simulated data and visually through synthetic and mammogram images.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).

关键词： Classical hidden Markov chain Bi-dimensional hidden Markov chain Unsupervised segmentation Hilbert-Peano scan MCMC methods em algorithm ICE algorithm Mammography images

来源：评论

学校读者我要写书评

暂无评论

Semi-supervised Model-Based Clustering for Ordinal Data 21st

Semi-supervised Model-Based Clustering for Ordinal Data

引用

21st Australasian Conference on Data Science and Machine Learning, AusDM 2023

作者： Cui, Ying McMillan, Louise Liu, Ivy School of Mathematics and Statistics Victoria University of Wellington Wellington New Zealand Centre for Data Science and Artificial Intelligence Victoria University of Wellington Wellington New Zealand

ISBN: (纸本)9789819986958

This paper introduces a semi-supervised learning technique for model-based clustering. Our research focus is on applying it to matrices of ordered categorical response data, such as those obtained from the surveys with Likert scale responses. We use the proportional odds model, which is popular and widely used for analyzing such data, as the model structure. Our proposed technique is designed for analyzing datasets that contain both labeled and unlabeled observations from multiple clusters. The model fitting is performed using the expectation-maximization (em) algorithm, incorporating the labeled cluster memberships, to cluster the unlabeled observations. To evaluate the performance of our proposed model, we conducted a simulation study in which we tested the model from eight different scenarios, each with varying combinations and proportions of known and unknown cluster memberships. The fitted models accurately estimate the parameters in most of the designed scenarios, indicating that our technique is effective in clustering partially-labeled data with ordered categorical response variables. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： clustering em algorithm Likert scale data ordinal data proportional odds model semi-supervised learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：