检索结果-内蒙古大学图书馆

A new method for robust mixture regression

CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE 2017年第1期45卷 77-94页

作者： Yu, Chun Yao, Weixin Chen, Kun Jiangxi Univ Finance & Econ Sch Stat Nanchang 330013 Jiangxi Peoples R China Univ Calif Riverside Dept Stat Riverside CA 92521 USA Univ Connecticut Dept Stat Storrs CT 06269 USA

Finite mixture regression models have been widely used for modelling mixed regression relationships arising from a clustered and thus heterogenous population. The classical normal mixture model, despite its simplicity and wide applicability, may fail in the presence of severe outliers. Using a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization, we propose a robust mixture regression approach for simultaneously conducting outlier detection and robust parameter estimation. A penalized likelihood approach is adopted to induce sparsity among the mean-shift parameters so that the outliers are distinguished from the remainder of the data, and a generalized Expectation-Maximization (em) algorithm is developed to perform stable and efficient computation. The proposed approach is shown to have strong connections with other robust methods including the trimmed likelihood method and M-estimation approaches. In contrast to several existing methods, the proposed methods show outstanding performance in our simulation studies. (C) 2016 Statistical Society of Canada

关键词： em algorithm mixture regression models outlier detection penalized likelihood

来源：评论

学校读者我要写书评

暂无评论

A Note on "Shaved Dice" Inference

引用

AMERICAN STATISTICIAN 2018年第2期72卷 155-157页

作者： Sundberg, Rolf Stockholm Univ Dept Math S-10691 Stockholm Sweden

Two dice are rolled repeatedly, only their sum is registered. Have the two dice been "shaved," so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the em algorithm, and it illustrates the information content in ancillary statistics.

关键词： Ancillarity Curved exponential families em algorithm Incomplete data model ML estimation Multinomial model

来源：评论

学校读者我要写书评

暂无评论

Estimating a network from multiple noisy realizations

引用

ELECTRONIC JOURNAL OF STATISTICS 2018年第2期12卷 4697-4740页

作者： Le, Can M. Levin, Keith Levina, Elizaveta Univ Calif Davis Dept Stat Davis CA 95616 USA Univ Michigan Dept Stat Ann Arbor MI 48109 USA

Complex interactions between entities are often represented as edges in a network. In practice, the network is often constructed from noisy measurements and inevitably contains some errors. In this paper we consider the problem of estimating a network from multiple noisy observations where edges of the original network are recorded with both false positives and false negatives. This problem is motivated by neuroimaging applications where brain networks of a group of patients with a particular brain condition could be viewed as noisy versions of an unobserved true network corresponding to the disease. The key to optimally leveraging these multiple observations is to take advantage of network structure, and here we focus on the case where the true network contains communities. Communities are common in real networks in general and in particular are believed to be presented in brain networks. Under a community structure assumption on the truth, we derive an efficient method to estimate the noise levels and the original network, with theoretical guarantees on the convergence of our estimates. We show on synthetic networks that the performance of our method is close to an oracle method using the true parameter values, and apply our method to fMRI brain data, demonstrating that it constructs stable and plausible estimates of the population network.

关键词： Noisy networks stochastic block model brain networks em algorithm

来源：评论

学校读者我要写书评

暂无评论

Robust mixture multivariate linear regression by multivariate Laplace distribution

引用

STATISTICS & PROBABILITY LETTERS 2017年 130卷 32-39页

作者： Li, Xiongya Bai, Xiuqin Song, Weixing Kansas State Univ Dept Stat Manhattan KS 66506 USA Eastern Washington Univ Dept Math Cheney WA 99004 USA

Assuming that the error terms follow a multivariate Laplace distribution, we propose a robust estimation procedure for mixture of multivariate linear regression models in this paper. Using the fact that the multivariate Laplace distribution is a scale mixture of the multivariate standard normal distribution, an efficient em algorithm is designed to implement the proposed robust estimation procedure. The performance of the proposed algorithm is thoroughly evaluated by some simulation and comparison studies. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Finite mixtures Multivariate linear regression Robust estimation Multivariate Laplace distribution em algorithm

来源：评论

学校读者我要写书评

暂无评论

Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies

引用

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2017年第520期112卷 1468-1476页

作者： Tao, Ran Zeng, Donglin Lin, Dan-Yu Vanderbilt Univ Med Ctr Dept Biostat Nashville TN USA Univ N Carolina Dept Biostat Chapel Hill NC 27599 USA

In modern epidemiological and clinical studies, the covariates of interest may involve genome sequencing, biomarker assay, or medical imaging and thus are prohibitively expensive to measure on a large number of subjects. A cost-effective solution is the two-phase design, under which the outcome and inexpensive covariates are observed for all subjects during the first phase and that information is used to select subjects for measurements of expensive covariates during the second phase. For example, subjects with extreme values of quantitative traits were selected for whole-exome sequencing in the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP). Herein, we consider general two-phase designs, where the outcome can be continuous or discrete, and inexpensive covariates can be continuous and correlated with expensive covariates. We propose a semiparametric approach to regression analysis by approximating the conditional density functions of expensive covariates given inexpensive covariates with B-spline sieves. We devise a computationally efficient and numerically stable em-algorithm to maximize the sieve likelihood. In addition, we establish the consistency, asymptotic normality, and asymptotic efficiency of the estimators. Furthermore, we demonstrate the superiority of the proposed methods over existing ones through extensive simulation studies. Finally, we present applications to the aforementioned NHLBI ESP. Supplementary materials for this article are available online

关键词： Biased sampling em algorithm Genome sequencing Response-selective sampling Semiparametric efficiency Sieve approximation

来源：评论

学校读者我要写书评

暂无评论

A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification

引用

algorithmS FOR MOLECULAR BIOLOGY 2017年第1期12卷 1-12页

作者： Schroeder, Christopher Rahmann, Sven Univ Duisburg Essen Genome Informat Inst Human Genet Univ Hosp Essen Hufelandstr 55 D-45147 Essen Germany

Background: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. Methods: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm combines latent variables with the method of moments instead of maximum likelihood, which has computational advantages over the popular em algorithm. Results: As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels. We also demonstrate that we can accurately infer the number of mixture components. Conclusions: The hybrid algorithm between likelihood-based component un-mixing and moment-based parameter estimation is a robust and efficient method for beta mixture estimation. We provide an implementation of the method ("betamix") as open source software under the MIT license.

关键词： Mixture model Beta distribution Maximum likelihood Method of moments em algorithm Differential methylation Classification

来源：评论

学校读者我要写书评

暂无评论

Regularized Latent Class Analysis with Application in Cognitive Diagnosis

引用

PSYCHOMETRIKA 2017年第3期82卷 660-692页

作者： Chen, Yunxiao Li, Xiaoou Liu, Jingchen Ying, Zhiliang Emory Univ Atlanta GA 30322 USA Univ Minnesota Minneapolis MN 55455 USA Columbia Univ New York NY 10027 USA

Diagnostic classification models are confirmatory in the sense that the relationship between the latent attributes and responses to items is specified or parameterized. Such models are readily interpretable with each component of the model usually having a practical meaning. However, parameterized diagnostic classification models are sometimes too simple to capture all the data patterns, resulting in significant model lack of fit. In this paper, we attempt to obtain a compromise between interpretability and goodness of fit by regularizing a latent class model. Our approach starts with minimal assumptions on the data structure, followed by suitable regularization to reduce complexity, so that readily interpretable, yet flexible model is obtained. An expectation-maximization-type algorithm is developed for efficient computation. It is shown that the proposed approach enjoys good theoretical properties. Results from simulation studies and a real application are presented.

关键词： diagnostic classification models latent class analysis regularization consistency em algorithm

来源：评论

学校读者我要写书评

暂无评论

Analysis of panel data under hidden mover-stayer models

引用

STATISTICS IN MEDICINE 2017年第20期36卷 3231-3243页

作者： Yi, Grace Y. He, Wenqing He, Feng Univ Waterloo Dept Stat & Actuarial Sci 200 Univ Ave West Waterloo ON N2L 3G1 Canada Univ Western Ontario Dept Stat & Actuarial Sci 1151 Richmond St North London ON N6A 5B7 Canada

Analysis of panel data is often challenged by the presence of heterogeneity and state misclassification. In this paper, we propose a hidden mover-stayer model to facilitate heterogeneity for a population that consists of two subpopulations each of movers or of stayers and to simultaneously account for state misclassification. We develop an inference procedure based on the expectation-maximization algorithm by treating the mover-stayer indicator and underlying true states as latent variables. We evaluate the performance of the proposed method and investigate the impact of ignoring misclassification through simulation studies. The proposed method is applied to analyze the data arising from the Waterloo Smoking Prevention Project. Copyright (C) 2017 John Wiley & Sons, Ltd.

关键词： em algorithm hidden mover-stayer model misclassification panel data

来源：评论

学校读者我要写书评

暂无评论

Joint regression modeling for missing categorical covariates in generalized linear models

引用

JOURNAL OF APPLIED STATISTICS 2018年第15期45卷 2741-2759页

作者： Carlos Perez-Ruiz, Luis Escarela, Gabriel Univ Autonoma Metropolitana Iztapalapa Dept Matemat Av San Rafael Atlixco 186 Mexico City 09340 DF Mexico

Missing covariates data is a common issue in generalized linear models (GLMs). A model-based procedure arising from properly specifying joint models for both the partially observed covariates and the corresponding missing indicator variables represents a sound and flexible methodology, which lends itself to maximum likelihood estimation as the likelihood function is available in computable form. In this paper, a novel model-based methodology is proposed for the regression analysis of GLMs when the partially observed covariates are categorical. Pair-copula constructions are used as graphical tools in order to facilitate the specification of the high-dimensional probability distributions of the underlying missingness components. The model parameters are estimated by maximizing the weighted loglikelihood function by using an em algorithm. In order to compare the performance of the proposed methodology with other well-established approaches, which include complete-cases and multiple imputation, several simulation experiments of Binomial, Poisson and Normal regressions are carried out under both missing at random and non-missing at random mechanisms scenarios. The methods are illustrated by modeling data from a stage III melanoma clinical trial. The results show that the methodology is rather robust and flexible, representing a competitive alternative to traditional techniques.

关键词： Copula missing data vines pair copula constructions em algorithm multivariate distribution

来源：评论

学校读者我要写书评

暂无评论

Model selection for the localized mixture of experts models

引用

JOURNAL OF APPLIED STATISTICS 2018年第11期45卷 1994-2006页

作者： Jiang, Yunlu Yu Conglian Ji Qinghua Jinan Univ Coll Econ Dept Stat Guangzhou 510632 Guangdong Peoples R China Shanghai Univ Finance & Econ Sch Stat & Management Shanghai Peoples R China

In this paper, we propose a penalized likelihood method to simultaneous select covariate, and mixing component and obtain parameter estimation in the localized mixture of experts models. We develop an expectation maximization algorithm to solve the proposed penalized likelihood procedure, and introduce a data-driven procedure to select the tuning parameters. Extensive numerical studies are carried out to compare the finite sample performances of our proposed method and other existing methods. Finally, we apply the proposed methodology to analyze the Boston housing price data set and the baseball salaries data set.

关键词： Localized mixture of experts models em algorithm SCAD penalty function

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：