Gaussian mixture modeling is a generative probabilistic model that assumes that the observed data are generated from a mixture of multiple Gaussian distributions. This mixture model provides a flexible approach to mod...
详细信息
Gaussian mixture modeling is a generative probabilistic model that assumes that the observed data are generated from a mixture of multiple Gaussian distributions. This mixture model provides a flexible approach to model complex distributions that may not be easily represented by a single Gaussian distribution. The Gaussian mixture model with a noise component refers to a finite mixture that includes an additional noise component to model the background noise or outliers in the data. This additional noise component helps to take into account the presence of anomalies or outliers in the data. This latter aspect is crucial for anomaly detection in situations where a clear, early warning of an abnormal condition is required. This paper proposes a novel entropy-based procedure for initializing the noise component in Gaussian mixture models. Our approach is shown to be easy to implement and effective for anomaly detection. We successfully identify anomalies in both simulated and real-world datasets, even in the presence of significant levels of noise and outliers. We provide a step-by-step description of the proposed data analysis process, along with the corresponding R code, which is publicly available in a GitHub repository.
Motivated by recent work on matrix-variate data analysis in various scientific domains, we propose a two-way factor model (2wFMs) to capture the separable effects of row and column attributes. This paper studies the i...
详细信息
Motivated by recent work on matrix-variate data analysis in various scientific domains, we propose a two-way factor model (2wFMs) to capture the separable effects of row and column attributes. This paper studies the identification conditions of 2wFMs and develops a block alternative optimization algorithm for maximum likelihood estimation (MLE). The asymptotic theories for the maximum likelihood estimators are established. Monte Carlo simulations show that the method we propose is effective and robust. & COPY;2023 Elsevier B.V. All rights reserved.
The analysis of the psychological impact of the spread of Yangming studies in Japan on the Japanese people is to enable Yangming studies to be better developed in Japan. Based on big data analysis technology, this pap...
详细信息
The analysis of the psychological impact of the spread of Yangming studies in Japan on the Japanese people is to enable Yangming studies to be better developed in Japan. Based on big data analysis technology, this paper constructs a hybrid data analysis model using the em algorithm and proposes performance evaluation indexes for the model. Under the em data analysis model constructed in this paper, the example indicators of the Japanese people's psychological impact in disseminating Yangming studies by big data analysis are explored, i.e., the psychological acceptability of the dissemination method and the psychological and moral construction impact. Regarding the dissemination method, the Japanese people are more receptive to disseminating Yangming studies in Japan through "learning rules", with an average percentage of 39.37%. Regarding psychological and moral construction, 90.22% of the Japanese people believe that disseminating Yangming studies can promote self-improvement of value standards and correct self-examination. Based on the big data analysis, we can effectively see from the data the impact of Yangming studies on the audience in the process of dissemination, and improve the scope of Yangming studies dissemination according to the data feedback, so that more people can recognize the idea of unity of knowledge and action.
Interval-censored data often arise in prospective studies involving periodical follow-up for monitoring the failure event occurrence. In addition to censoring, left truncation also occurs if only participants who have...
详细信息
Interval-censored data often arise in prospective studies involving periodical follow-up for monitoring the failure event occurrence. In addition to censoring, left truncation also occurs if only participants who have not experienced the failure event are enrolled in the study, which clearly induces the selection bias and makes the analysis more complicated. This work provides an efficient maximum likelihood estimation approach that appropriately adjusts the biased sampling for the proportional hazards model with left-truncated and interval-censored data. A flexible and stable expectation-maximisation algorithm via a two-stage data augmentation is developed to maximise the intractable likelihood function. The asymptotic properties of the proposed estimators are established with the empirical process theory. The numerical results obtained from extensive simulations suggest that the proposed method performs satisfactorily and has some prominent advantages over the competing methods. An application to a colon cancer dataset also demonstrates the usefulness of the proposed method.
Modeling infectious disease dynamics has been critical throughout the COVID-19 pandemic. Of particular interest are the incidence, prevalence, and effective reproductive number (R-t). Estimating these quantities is ch...
详细信息
Modeling infectious disease dynamics has been critical throughout the COVID-19 pandemic. Of particular interest are the incidence, prevalence, and effective reproductive number (R-t). Estimating these quantities is challenging due to under-ascertainment, unreliable reporting, and time lags between infection, onset, and testing. We propose a Multilevel Epidemic Regression Model to Account for Incomplete Data (MERMAID) to jointly estimate R-t, ascertainment rates, incidence, and prevalence over time in one or multiple regions. Specifically, MERMAID allows for a flexible regression model of R-t that can incorporate geographic and time-varying covariates. To account for under-ascertainment, we (a) model the ascertainment probability over time as a function of testing metrics and (b) jointly model data on confirmed infections and population-based serological surveys. To account for delays between infection, onset, and reporting, we model stochastic lag times as missing data, and develop an em algorithm to estimate the model parameters. We evaluate the performance of MERMAID in simulation studies, and assess its robustness by conducting sensitivity analyses in a range of scenarios of model misspecifications. We apply the proposed method to analyze COVID-19 daily confirmed infection counts, PCR testing data, and serological survey data across the United States. Based on our model, we estimate an overall COVID-19 prevalence of 12.5% (ranging from 2.4% in Maine to 20.2% in New York) and an overall ascertainment rate of 45.5% (ranging from 22.5% in New York to 81.3% in Rhode Island) in the United States from March to December 2020. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
This paper considers the misrepresentation problem in a multivariate Poisson model. As for inference, we develop an expectation-maximization (em) algorithm. A simulation study is carried out to validate our algorithm....
详细信息
This paper considers the misrepresentation problem in a multivariate Poisson model. As for inference, we develop an expectation-maximization (em) algorithm. A simulation study is carried out to validate our algorithm. Finally, our model is applied to a real data set.(c) 2023 Elsevier B.V. All rights reserved.
We propose a new, model-based methodology to address two major problems in survey sampling: The first problem is known as mode effects, under which responses of sampled units possibly depend on the mode of response, w...
详细信息
We propose a new, model-based methodology to address two major problems in survey sampling: The first problem is known as mode effects, under which responses of sampled units possibly depend on the mode of response, whether by internet, telephone, personal interview, etc. The second problem is of proxy surveys, whereby sampled units respond not only about themselves but also for other sampled. For example, in many familiar household surveys, one member of the household provides information for all other members, possibly with measurement errors. Ignoring the existence of mode effects and/or possible measurement errors in proxy surveys could result in possible bias in point estimators and subsequent inference. Our approach accounts also for nonignorable nonresponse. We illustrate the proposed methodology by use of simulation experiments and real sample data, with known true population values.
Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical da...
详细信息
Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical data is the presence of covariates, which may mask the obtained clustering structure. For example, suppose we are interested in clustering a set of individuals into controls and cancer patients. A clustering algorithm could group subjects into young and elderly in this case. It may happen because the age at diagnosis is associated with cancer. Thus, we developed Cem-Co, a model-based clustering algorithm that removes/minimizes undesirable covariates' effects during the clustering process. We applied Cem-Co on a gene expression dataset composed of 129 stage I non-small cell lung cancer patients. As a result, we identified a subgroup with a poorer prognosis, while standard clustering algorithms failed.
Hidden Markov chain (HMC) models have been widely used in unsupervised image segmentation. In these models, there is a double process;a hidden one noted X and an observed one, which is often one-dimensional, noted Y. ...
详细信息
Hidden Markov chain (HMC) models have been widely used in unsupervised image segmentation. In these models, there is a double process;a hidden one noted X and an observed one, which is often one-dimensional, noted Y. The latter is constituted by pixels of a noisy image after transforming its bi-dimensional form into a monodimensional sequence. In this context, these models run into a problem of relationships between pixels which is often solved by applying curves such as the Hilbert-Peano scan when modeling the image under study. We propose enriching the HMC model by introducing a second component to the observed process Y based on the average of two observations which are neighbors in the image but are not in the chain of each considered pixel. This gives a bi-dimensional HMC model which has the same structure as the classical model except for the two-dimensional case of the low mod-eling noise. The estimation of the parameters of this model is carried out by using a three-algorithm approach: Bayesian one based mainly on the Markov Chain Monte Carlo (MCMC) methods, Expectation-Maximization (em), and Iterative Conditional Estimation (ICE). We apply the final Bayesian decision criteria Marginal Posterior Mode to come up with a final configuration of the result X. The proposed model is compared to the classical HMC model in combination with the Hilbert-Peano scan numerically through simulated data and visually through synthetic and mammogram images.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
This paper introduces a semi-supervised learning technique for model-based clustering. Our research focus is on applying it to matrices of ordered categorical response data, such as those obtained from the surveys wit...
详细信息
暂无评论