检索结果-内蒙古大学图书馆

Capturing between-tasks covariance and similarities using multivariate linear mixed models

ELECTRONIC JOURNAL OF STATISTICS 2020年第2期14卷 3821-3844页

作者： Navon, Aviv Rosset, Saharon Bar Ilan Univ Ramat Gan Israel Tel Aviv Univ Tel Aviv Israel

We consider the problem of predicting several response variables using the same set of explanatory variables. This setting naturally induces a group structure over the coefficient matrix, in which every explanatory variable corresponds to a set of related coefficients. Most of the existing methods that utilize this group formation assume that the similarities between related coefficients arise solely through a joint sparsity structure. In this paper, we propose a procedure for constructing multivariate regression models, that directly capture and model the within-group similarities, by employing a multivariate linear mixed model formulation, with a joint estimation of covariance matrices for coefficients and errors via penalized likelihood. Our approach, which we term MrRCE for Multivariate random Regression with Covariance Estimation, encourages structured similarity in parameters, in which coefficients for the same variable in related tasks share the same sign and similar magnitude. We illustrate the benefits of our approach in synthetic and real examples, and show that the proposed method outperforms natural competitors and alternative estimators under several model settings.

关键词： Covariance selection em algorithm multivariate regression penalized likelihood regularization methods sparse precision matrix

来源：评论

学校读者我要写书评

暂无评论

Incomplete-data Fisher scoring method with steplength adjustment

引用

STATISTICS AND COMPUTING 2020年第4期30卷 871-886页

作者： Takai, Keiji Kansai Univ 3-3-35 Yamate Cho Suita Osaka Japan

An incomplete-data Fisher scoring method is proposed for parameter estimation in models where data are missing and in latent-variable models that can be formulated as a missing data problem. The convergence properties of the proposed method and an accelerated variant of this method are provided. The main features of this method are its ability to accelerate the rate of convergence by adjusting the steplength, to provide a second derivative of the observed-data log-likelihood function using only the functions used in the proposed method, and the ability to avoid having to explicitly solve the first derivative of the object function. Four examples are presented to demonstrate how the proposed method converges compared with the em algorithm and its variants. The computing time is also compared.

关键词： Incomplete data Acceleration Convergence analysis em algorithm Standard error Fisher scoring

来源：评论

学校读者我要写书评

暂无评论

Multivariate finite-support phase-type distributions

引用

JOURNAL OF APPLIED PROBABILITY 2020年第4期57卷 1260-1275页

作者： Pavithra, Celeste R. Deepak, T. G. IIST Thiruvananthapuram Thiruvananthapuram Kerala India Indian Inst Space Sci & Technol Dept Math Thiruvananthapuram Kerala India

We introduce a multivariate class of distributions with support I, a k-orthotope in [0,infinity)(k), which is dense in the set of all k-dimensional distributions with support I. We call this new class `multivariate finite-support phase-type distributions' (MFSPH). Though we generally define MFSPH distributions on any finite k-orthotope in [0,infinity)(k), here we mainly deal with MFSPH distributions with support [0, 1)(k). The distribution function of an MFSPH variate is computed by using that of a variate in the MPH* class, the multivariate class of distributions introduced by Kulkarni (1989). The marginal distributions of MFSPH variates are found as FSPH distributions, the class studied by Ramaswami and Viswanath (2014). Some properties, including the mixture property, of MFSPH distributions are established. Estimates of the parameters of a particular class of bivariate finite-support phase-type distributions are found by using the expectation-maximization algorithm. Simulated samples are used to demonstrate how this class could be used as approximations for bivariate finite-support distributions.

关键词： Multivariate PH distribution denseness em algorithm

来源：评论

学校读者我要写书评

暂无评论

引用

ANNALS OF APPLIED STATISTICS 2020年第3期14卷 1304-1325页

作者： Jung, Hohyun Lee, Jae-Gil Lee, Namgil Kim, Sung-Ho Korea Adv Inst Sci & Technol Dept Math Sci Daejeon South Korea Korea Adv Inst Sci & Technol Grad Sch Knowledge Serv Engn Daejeon South Korea Kangwon Natl Univ Dept Informat Stat Chunchon South Korea

Community Question Answering (CQA) websites are widely used in sharing knowledge, where users can ask questions, reply answers and evaluate answers. So far, the evaluation of answers has been explained by the contents of answers through the investigation of users' topics of interest and expertise levels. In this paper we focus on modeling the user's evaluation behavior, in that users can see the answerer's profile as well as the answer content before evaluating the quality of the answer. We propose a model called Popularity-based Topical Expertise Model (PTem), a generative model to analyze the rich-get-richer phenomenon that popular user's answers are more recommended. We can simultaneously estimate the topical expertise of each user and the strength of the rich-get-richer effect through the em algorithm combined with collapsed Gibbs sampling. Experiments are performed on the StackExchange data, and the results demonstrate a rich-get-richer phenomenon in the community. We further discuss the superiority and usefulness of the proposed model through analysis in the discipline of philosophy.

关键词： Topic analysis em algorithm community question answering rich get richer user behavior

来源：评论

学校读者我要写书评

暂无评论

Testing and Estimation of Social Network Dependence With Time to Event Data

引用

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2020年第530期115卷 570-582页

作者： Su, Lin Lu, Wenbin Song, Rui Huang, Danyang North Carolina State Univ Dept Stat Raleigh NC 27695 USA Remin Univ Sch Stat Beijing Peoples R China

Nowadays, events are spread rapidly along social networks. We are interested in whether people's responses to an event are affected by their friends' characteristics. For example, how soon will a person start playing a game given that his/her friends like it? Studying social network dependence is an emerging research area. In this work, we propose a novel latent spatial autocorrelation Cox model to study social network dependence with time-to-event data. The proposed model introduces a latent indicator to characterize whether a person's survival time might be affected by his or her friends' features. We first propose a score-type test for detecting the existence of social network dependence. If it exists, we further develop an em-type algorithm to estimate the model parameters. The performance of the proposed test and estimators are illustrated by simulation studies and an application to a time-to-event dataset about playing a popular mobile game from one of the largest online social network platforms. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

关键词： Cox model em algorithm Social network dependence Time-to-event data

来源：评论

学校读者我要写书评

暂无评论

Regression multiple imputation for missing data analysis

引用

STATISTICAL METHODS IN MEDICAL RESEARCH 2020年第9期29卷 2647-2664页

作者： Yu, Lili Liu, Liang Peace, Karl E. Georgia Southern Univ Jiann Ping Hsu Coll Publ Hlth Dept Biostat Statesboro GA 30460 USA Univ Georgia Dept Stat Athens GA 30602 USA

Iterative multiple imputation is a popular technique for missing data analysis. It updates the parameter estimators iteratively using multiple imputation method. This technique is convenient and flexible. However, the parameter estimators do not converge point-wise and are not efficient for finite imputation size m. In this paper, we propose a regression multiple imputation method. It uses the parameter estimators obtained from multiple imputation method to estimate the parameter estimators based on expectation maximization algorithm. We show that the resulting estimators are asymptotically efficient and converge point-wise for small m values, when the iteration k of the iterative multiple imputation goes to infinity. We evaluate the performance of the new proposed methods through simulation studies. A real data analysis is also conducted to illustrate the new method.

关键词： Convergence em algorithm imputation size missing at random Rubin's variance estimator

来源：评论

学校读者我要写书评

暂无评论

A new method for regression analysis of interval-censored data with the additive hazards model

引用

JOURNAL OF THE KOREAN STATISTICAL SOCIETY 2020年第4期49卷 1131-1147页

作者： Wang, Peijie Zhou, Yong Sun, Jianguo Shanghai Univ Finance & Econ Sch Stat & Management Shanghai 200433 Peoples R China Jilin Univ Ctr Appl Stat Res Sch Math Changchun 130012 Peoples R China East China Normal Univ Key Lab Adv Theory & Applicat Stat & Data Sci MOE Shanghai 200062 Peoples R China East China Normal Univ Acad Stat & Interdisciplinary Sci Shanghai 200062 Peoples R China Univ Missouri Dept Stat Columbia MO 65211 USA

The additive hazards model is one of the most popular regression models for analyzing failure time data, especially when one is interested in the excess risk or risk difference. Although a couple of methods have been developed in the literature for regression analysis of interval-censored data, a general type of failure time data, they may be complicated or inefficient. Corresponding to this, we present a new maximum likelihood estimation procedure based on the sieve approach and in particular, develop an em algorithm that involves a two-stage data augmentation with the use of Poisson latent variables. The method can be easily implemented and the asymptotic properties of the proposed estimators are established. A simulation study is conducted to assess the performance of the proposed method and indicates that it works well for practical situations. Also the method is applied to a set of interval-censored data from an AIDS cohort study.

关键词： Additive hazards model em algorithm Interval-censored data Latent Poisson random variable

来源：评论

学校读者我要写书评

暂无评论

A support vector machine based semiparametric mixture cure model

引用

COMPUTATIONAL STATISTICS 2020年第3期35卷 931-945页

作者： Li, Peizhi Peng, Yingwei Jiang, Ping Dong, Qingli Dongbei Univ Finance & Econ Sch Finance Dalian Peoples R China Queens Univ Dept Publ Hlth Sci Kingston ON Canada Queens Univ Dept Math & Stat Kingston ON Canada Dongbei Univ Finance & Econ Sch Stat Dalian Peoples R China Dalian Univ Technol Fac Management & Econ Dalian Peoples R China

The mixture cure model is an extension of standard survival models to analyze survival data with a cured fraction. Many developments in recent years focus on the latency part of the model to allow more flexible modeling strategies for the distribution of uncured subjects, and fewer studies focus on the incidence part to model the probability of being uncured/cured. We propose a new mixture cure model that employs the support vector machine (SVM) to model the covariate effects in the incidence part of the cure model. The new model inherits the features of the SVM to provide a flexible model to assess the effects of covariates on the incidence. Unlike the existing nonparametric approaches for the incidence part, the SVM method also allows for potentially high-dimensional covariates in the incidence part. Semiparametric models are also allowed in the latency part of the proposed model. We develop an estimation method to estimate the cure model and conduct a simulation study to show that the proposed model outperforms existing cure models, particularly in incidence estimation. An illustrative example using data from leukemia patients is given.

关键词： Censored survival time Cure model Support vector machine em algorithm Multiple imputation

来源：评论

学校读者我要写书评

暂无评论

Generalized finite mixture of multivariate regressions with applications to therapeutic biomarker identification

引用

STATISTICS IN MEDICINE 2020年第28期39卷 4301-4324页

作者： Liu, Hongmei Rao, J. Sunil Univ Miami Div Biostat Coral Gables FL 33124 USA

Finite mixtures of regressions have been used to analyze data that come from a heterogeneous population. When more than one response is observed, accommodating a multivariate response can be useful. In this article, we go a step further and introduce a multivariate extension that includes a latent overlapping cluster indicator variable that allows for potential overdispersion. A generalized mixture of multivariate regressions in connection with the proposed model and a new em algorithm for fitting are provided. In addition, we allow for high-dimensional predictors via shrinkage estimation. This model proves particularly useful in the analysis of complex data like the search for cancer therapeutic biomarkers. We demonstrate this using the genomics of drug sensitivity in cancer resource.

关键词： cancer biomarkers em algorithm Lasso mixture of multivariate regression models overlapping clustering

来源：评论

学校读者我要写书评

暂无评论

Statistical inference for missing data mechanisms

引用

STATISTICS IN MEDICINE 2020年第28期39卷 4325-4333页

作者： Zhao, Yang Univ Regina Dept Math & Stat CollegeWest 307-14 Regina SK S4S 0A2 Canada

In the literature of statistical analysis with missing data there is a significant gap in statistical inference for missing data mechanisms especially for nonmonotone missing data, which has essentially restricted the use of the estimation methods which require estimating the missing data mechanisms. For example, the inverse probability weighting methods (Horvitz & Thompson, 1952;Little & Rubin, 2002), including the popular augmented inverse probability weighting (Robins et al, 1994), depend on sufficient models for the missing data mechanisms to reduce estimation bias while improving estimation efficiency. This research proposes a semiparametric likelihood method for estimating missing data mechanisms where an em algorithm with closed form expressions for both E-step and M-step is used in evaluating the estimate (Zhao et al, 2009;Zhao, 2020). The asymptotic variance of the proposed estimator is estimated from the profile score function. The methods are general and robust. Simulation studies in various missing data settings are performed to examine the finite sample performance of the proposed method. Finally, we analysis the missing data mechanism of Duke cardiac catheterization coronary artery disease diagnostic data to illustrate the method.

关键词： em algorithm missing data mechanism nonmonotone missing data pattern pseudo-likelihood

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：