检索结果-内蒙古大学图书馆

Composite Likelihood Inference in a Discrete Latent Variable Model for Two-Way "Clustering-by-Segmentation" Problems

引用

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2017年第2期26卷 388-402页

作者： Bartolucci, Francesco Chiaromonte, Francesca Don, Prabhani Kuruppumullage Lindsay, Bruce G. Univ Perugia Dept Econ Via A Pascoli 20 I-06123 Perugia Italy Penn State Univ Dept Stat State Coll PA USA Harvard TH Chan Sch Publ Hlth Dana Farber Canc Inst Dept Biostat & Computat Biol Boston MA USA Harvard TH Chan Sch Publ Hlth Dept Biostat Boston MA USA

We consider a discrete latent variable model for two-way data arrays, which allows one to simultaneously produce clusters along one of the data dimensions (e.g.,exchangeable observational units or features) and contiguous groups, or segments, along the other (e.g.,consecutively ordered times or locations). The model relies on a hidden Markov structure but, given its complexity, cannot be estimated by full maximum likelihood. Therefore, we introduce a composite likelihood methodology based on considering different subsets of the data. The proposed approach is illustrated by simulation, and with an application to genomic data.

关键词： Composite likelihood Cross-validation Crossed-effects models em algorithm Finite mixture models Genomics

来源：评论

学校读者我要写书评

暂无评论

An improved statistical model for taxonomic assignment of metagenomics

引用

BMC GENETICS 2018年第1期19卷 98-98页

作者： Yao, Yujing Jin, Zhezhen Lee, Joseph H. Columbia Univ Dept Biostat New York NY USA Columbia Univ Sergievsky Ctr Taub Inst New York NY 10027 USA Columbia Univ Dept Epidemiol New York NY 10027 USA Columbia Univ Dept Neurol New York NY 10027 USA Columbia Univ Sergievsky Ctr 630 West 168th StP&S Unit 16 New York NY 10032 USA

BackgroundWith the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different *** extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high *** was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.

关键词： em algorithm Metagenomics Taxonomic assignment

来源：评论

学校读者我要写书评

暂无评论

Baum-Welch algorithm on directed acyclic graph for mixtures with latent Bayesian networks

引用

STAT 2017年第1期6卷 303-314页

作者： Li, Jia Lin, Lin Penn State Univ Dept Stat University Pk PA 16802 USA

We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors X-(t), X-(t) is an element of R-dt, t = 1, ..., T. Each X-(t) is associated with a latent state s(t), given which X-(t) is conditionally independent from other variables. The joint distribution of the states is governed by a Bayes net. Although specific types of MLBN have been used in diverse areas such as biomedical research and image analysis, the exact expectation-maximization (em) algorithm for estimating the models can involve visiting all the combinations of states, yielding exponential complexity in the network size. A prominent exception is the Baum-Welch algorithm for the hidden Markov model, where the underlying graph topology is a chain. We hereby develop a new Baum-Welch algorithm on directed acyclic graph (BW-DAG) for the general MLBN and prove that it is an exact em algorithm. BW-DAG provides insight on the achievable complexity of em. For a tree graph, the complexity of BW-DAG is much lower than that of the brute-force em. Copyright (c) 2017 John Wiley & Sons, Ltd.

关键词： Baum-Welch algorithm Bayesian network directed acyclic graph em algorithm hidden Markov model maximum likelihood estimation

来源：评论

学校读者我要写书评

暂无评论

Bivariate discrete generalized exponential distribution

引用

STATISTICS 2017年第5期51卷 1143-1158页

作者： Nekoukhou, Vahid Kundu, Debasis Khansar Fac Math & Comp Sci Dept Stat Khansar Iran Indian Inst Technol Kanpur Dept Math & Stat Kanpur India

In this paper, we develop a bivariate discrete generalized exponential distribution, whose marginals are discrete generalized exponential distribution as proposed by Nekoukhou, Alamatsaz and Bidram [Discrete generalized exponential distribution of a second type. Statistics. 2013;47:876-887]. It is observed that the proposed bivariate distribution is a very flexible distribution and the bivariate geometric distribution can be obtained as a special case of this distribution. The proposed distribution can be seen as a natural discrete analogue of the bivariate generalized exponential distribution proposed by Kundu and Gupta [Bivariate generalized exponential distribution. J Multivariate Anal. 2009;100:581-593]. We study different properties of this distribution and explore its dependence structures. We propose a new em algorithm to compute the maximum-likelihood estimators of the unknown parameters which can be implemented very efficiently, and discuss some inferential issues also. The analysis of one data set has been performed to show the effectiveness of the proposed model. Finally, we propose some open problems and conclude the paper.

关键词： Discrete bivariate model generalized exponential distribution maximum-likelihood estimators positive dependence joint probability mass function em algorithm Primary: 62F10 Secondary: 62H10

来源：评论

学校读者我要写书评

暂无评论

A non-negative matrix factorization model based on the zero-inflated Tweedie distribution

引用

COMPUTATIONAL STATISTICS 2017年第2期32卷 475-499页

作者： Abe, Hiroyasu Yadohisa, Hiroshi Doshisha Univ 1-3 Tatara Miyakodani Kyotanabe Kyoto 6100394 Japan

Non-negative matrix factorization (NMF) is a technique of multivariate analysis used to approximate a given matrix containing non-negative data using two non-negative factor matrices that has been applied to a number of fields. However, when a matrix containing non-negative data has many zeroes, NMF encounters an approximation difficulty. This zero-inflated situation occurs often when a data matrix is given as count data, and becomes more challenging with matrices of increasing size. To solve this problem, we propose a new NMF model for zero-inflated non-negative matrices. Our model is based on the zero-inflated Tweedie distribution. The Tweedie distribution is a generalization of the normal, the Poisson, and the gamma distributions, and differs from each of the other distributions in the degree of robustness of its estimated parameters. In this paper, we show through numerical examples that the proposed model is superior to the basic NMF model in terms of approximation of zero-inflated data. Furthermore, we show the differences between the estimated basis vectors found using the basic and the proposed NMF models for divergence by applying it to real purchasing data.

关键词： &beta divergence em algorithm Auxiliary function Count data

来源：评论

学校读者我要写书评

暂无评论

Robust quantile regression using a generalized class of skewed distributions

引用

STAT 2017年第1期6卷 113-130页

作者： Morales, Christian Galarza Davila, Victor Lachos Cabral, Celso Barbosa Cepero, Luis Castro Escuela Super Politecn Litoral Dept Matemat ESPOL Guayaquil 090902 Ecuador Univ Estadual Campinas Dept Estat BR-13083859 Campinas SP Brazil Univ Fed Amazonas Dept Estat BR-69080000 Manaus Amazonas Brazil Univ Concepcion Dept Estat Concepcion 4070386 Chile Univ Concepcion CI2MA Concepcion 4070386 Chile

It is well known that the widely popular mean regression model could be inadequate if the probability distribution of the observed responses do not follow a symmetric distribution. To deal with this situation, the quantile regression turns to be a more robust alternative for accommodating outliers and the misspecification of the error distribution because it characterizes the entire conditional distribution of the outcome variable. This paper presents a likelihood-based approach for the estimation of the regression quantiles based on a new family of skewed distributions. This family includes the skewed version of normal, Student-t, Laplace, contaminated normal and slash distribution, all with the zero quantile property for the error term and with a convenient and novel stochastic representation that facilitates the implementation of the expectation-maximization algorithm for maximum likelihood estimation of the pth quantile regression parameters. We evaluate the performance of the proposed expectation-maximization algorithm and the asymptotic properties of the maximum likelihood estimates through empirical experiments and application to a real-life dataset. The algorithm is implemented in the R package lqr, providing full estimation and inference for the parameters as well as simulation envelope plots useful for assessing the goodness of fit. Copyright (C) 2017 John Wiley & Sons, Ltd.

关键词： em algorithm quantile regression model scale mixtures of normal distributions

来源：评论

学校读者我要写书评

暂无评论

From Euclidean distances to APC models

引用

QUALITY & QUANTITY 2017年第2期51卷 829-846页

作者： De Santis, Gustavo Mucciardi, Massimo Univ Florence DiSIA Dept Stat Informat Applicat G Parenti Florence Italy Univ Messina Dept Econ Messina Italy

In this paper we show that a recently developed method for the study of ''cultural'' differences, called DBS-em, or Distance Between Strata estimated with the em (Expectation Maximization) algorithm, can also be used to circumvent the difficulties posed by APC (or Age, Period, Cohort) models. The DBS-em method produces an original measure of the distance (dependent variable) between any two subsets of observations (strata) within a sample, where the stratification variables can be interpreted as regressors. When these stratification variables are age, period, and cohort, what results is an APC model which, however, proves immune to the ''intrinsic collinearity problem'' (C = P-A). With a few limitations, to be sure, which are discussed in the article. In our application to Italian data over the years 1993-2013, age and cohort strongly shape cultural consumption, while cohort and period impact, but only up to a point, on political participation.

关键词： APC (age period cohort) models Clusters Cultural distance em algorithm Euclidean distance Quantile regression

来源：评论

学校读者我要写书评

暂无评论

Analyzing semi-competing risks data with missing cause of informative terminal event

引用

STATISTICS IN MEDICINE 2017年第5期36卷 738-753页

作者： Zhou, Renke Zhu, Hong Bondy, Melissa Ning, Jing Baylor Coll Med Duncan Canc Ctr Houston TX 77030 USA Univ Texas Southwestern Med Ctr Dallas Dept Clin Sci Div Biostat Dallas TX 75390 USA Univ Texas MD Anderson Canc Ctr Dept Biostat Houston TX 77030 USA

Cancer studies frequently yield multiple event times that correspond to landmarks in disease progression, including non-terminal events (i.e., cancer recurrence) and an informative terminal event (i.e., cancer-related death). Hence, we often observe semi-competing risks data. Work on such data has focused on scenarios in which the cause of the terminal event is known. However, in some circumstances, the information on cause for patients who experience the terminal event is missing;consequently, we are not able to differentiate an informative terminal event from a non-informative terminal event. In this article, we propose a method to handle missing data regarding the cause of an informative terminal event when analyzing the semi-competing risks data. We first consider the nonparametric estimation of the survival function for the terminal event time given missing cause-of-failure data via the expectation-maximization algorithm. We then develop an estimation method for semi-competing risks data with missing cause of the terminal event, under a pre-specified semiparametric copula model. We conduct simulation studies to investigate the performance of the proposed method. We illustrate our methodology using data from a study of early-stage breast cancer. Copyright (C) 2016 John Wiley & Sons, Ltd.

关键词： copula model em algorithm informative censoring missing cause of failure semi-competing risks

来源：评论

学校读者我要写书评

暂无评论

Movers and Stayers in The Farming Sector: Accounting for Unobserved Heterogeneity in Structural Change

引用

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS 2017年第4期66卷 777-795页

作者： Saint-Cyr, Legrand D. F. Piet, Laurent AGROCAMPUS OUEST Rennes France AGROCAMPUS OUEST Inst Natl Rech Agron Unite Mixte Rech 1302 Struct & Marches Agr Ressources & Terr 4 Allee Adolphe BobierreCS 61103 F-35011 Rennes France

The paper investigates whether accounting for unobserved heterogeneity in farms' size transition processes improves the representation of structural change in agriculture. Considering a mixture of two types of farm, the mover-stayer model is applied for the first time in an agricultural economics context. The maximum likelihood method and the expectation-maximization algorithm are used to estimate the model's parameters. An empirical application to a panel of French farms from 2000 to 2013 shows that the mover-stayer model outperforms the homogeneous Markov chain model in recovering the transition process and predicting the future distribution of farm sizes.

关键词： em algorithm Farms Markov chain Mover-stayer model Structural change Unobserved heterogeneity

来源：评论

学校读者我要写书评

暂无评论

A clustering cure rate model with application to a sealantstudy

引用

JOURNAL OF APPLIED STATISTICS 2017年第16期44卷 2949-2962页

作者： Gallardo, Diego I. Bolfarine, Heleno Pedroso-de-Lima, Atonio Carlos Univ Antofagasta Fac Ciencias Basicas Dept Matemat Antofagasta Chile Univ Sao Paulo Inst Matemat & Estat Sao Paulo Brazil

In this paper, the destructive negative binomial (DNB) cure rate model with a latent activation scheme [V. Cancho, D. Bandyopadhyay, F. Louzada, and B. Yiqi, The DNB cure rate model with a latent activation scheme, Statistical Methodology 13 (2013b), pp. 48-68] is extended to the case where the observations are grouped into clusters. Parameter estimation is performed based on the restricted maximum likelihood approach and on a Bayesian approach based on Dirichlet process priors. An application to a real data set related to a sealant study in a dentistry experiment is considered to illustrate the performance of the proposed model.

关键词： Bivariate random effects competing risks Dirichlet processes em algorithm latent activation scheme restricted maximum likelihood

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：