We consider the problem of predicting several response variables using the same set of explanatory variables. This setting naturally induces a group structure over the coefficient matrix, in which every explanatory va...
详细信息
We consider the problem of predicting several response variables using the same set of explanatory variables. This setting naturally induces a group structure over the coefficient matrix, in which every explanatory variable corresponds to a set of related coefficients. Most of the existing methods that utilize this group formation assume that the similarities between related coefficients arise solely through a joint sparsity structure. In this paper, we propose a procedure for constructing multivariate regression models, that directly capture and model the within-group similarities, by employing a multivariate linear mixed model formulation, with a joint estimation of covariance matrices for coefficients and errors via penalized likelihood. Our approach, which we term MrRCE for Multivariate random Regression with Covariance Estimation, encourages structured similarity in parameters, in which coefficients for the same variable in related tasks share the same sign and similar magnitude. We illustrate the benefits of our approach in synthetic and real examples, and show that the proposed method outperforms natural competitors and alternative estimators under several model settings.
An incomplete-data Fisher scoring method is proposed for parameter estimation in models where data are missing and in latent-variable models that can be formulated as a missing data problem. The convergence properties...
详细信息
An incomplete-data Fisher scoring method is proposed for parameter estimation in models where data are missing and in latent-variable models that can be formulated as a missing data problem. The convergence properties of the proposed method and an accelerated variant of this method are provided. The main features of this method are its ability to accelerate the rate of convergence by adjusting the steplength, to provide a second derivative of the observed-data log-likelihood function using only the functions used in the proposed method, and the ability to avoid having to explicitly solve the first derivative of the object function. Four examples are presented to demonstrate how the proposed method converges compared with the em algorithm and its variants. The computing time is also compared.
We introduce a multivariate class of distributions with support I, a k-orthotope in [0,infinity)(k), which is dense in the set of all k-dimensional distributions with support I. We call this new class `multivariate fi...
详细信息
We introduce a multivariate class of distributions with support I, a k-orthotope in [0,infinity)(k), which is dense in the set of all k-dimensional distributions with support I. We call this new class `multivariate finite-support phase-type distributions' (MFSPH). Though we generally define MFSPH distributions on any finite k-orthotope in [0,infinity)(k), here we mainly deal with MFSPH distributions with support [0, 1)(k). The distribution function of an MFSPH variate is computed by using that of a variate in the MPH* class, the multivariate class of distributions introduced by Kulkarni (1989). The marginal distributions of MFSPH variates are found as FSPH distributions, the class studied by Ramaswami and Viswanath (2014). Some properties, including the mixture property, of MFSPH distributions are established. Estimates of the parameters of a particular class of bivariate finite-support phase-type distributions are found by using the expectation-maximization algorithm. Simulated samples are used to demonstrate how this class could be used as approximations for bivariate finite-support distributions.
Community Question Answering (CQA) websites are widely used in sharing knowledge, where users can ask questions, reply answers and evaluate answers. So far, the evaluation of answers has been explained by the contents...
详细信息
Community Question Answering (CQA) websites are widely used in sharing knowledge, where users can ask questions, reply answers and evaluate answers. So far, the evaluation of answers has been explained by the contents of answers through the investigation of users' topics of interest and expertise levels. In this paper we focus on modeling the user's evaluation behavior, in that users can see the answerer's profile as well as the answer content before evaluating the quality of the answer. We propose a model called Popularity-based Topical Expertise Model (PTem), a generative model to analyze the rich-get-richer phenomenon that popular user's answers are more recommended. We can simultaneously estimate the topical expertise of each user and the strength of the rich-get-richer effect through the em algorithm combined with collapsed Gibbs sampling. Experiments are performed on the StackExchange data, and the results demonstrate a rich-get-richer phenomenon in the community. We further discuss the superiority and usefulness of the proposed model through analysis in the discipline of philosophy.
Nowadays, events are spread rapidly along social networks. We are interested in whether people's responses to an event are affected by their friends' characteristics. For example, how soon will a person start ...
详细信息
Nowadays, events are spread rapidly along social networks. We are interested in whether people's responses to an event are affected by their friends' characteristics. For example, how soon will a person start playing a game given that his/her friends like it? Studying social network dependence is an emerging research area. In this work, we propose a novel latent spatial autocorrelation Cox model to study social network dependence with time-to-event data. The proposed model introduces a latent indicator to characterize whether a person's survival time might be affected by his or her friends' features. We first propose a score-type test for detecting the existence of social network dependence. If it exists, we further develop an em-type algorithm to estimate the model parameters. The performance of the proposed test and estimators are illustrated by simulation studies and an application to a time-to-event dataset about playing a popular mobile game from one of the largest online social network platforms. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Iterative multiple imputation is a popular technique for missing data analysis. It updates the parameter estimators iteratively using multiple imputation method. This technique is convenient and flexible. However, the...
详细信息
Iterative multiple imputation is a popular technique for missing data analysis. It updates the parameter estimators iteratively using multiple imputation method. This technique is convenient and flexible. However, the parameter estimators do not converge point-wise and are not efficient for finite imputation size m. In this paper, we propose a regression multiple imputation method. It uses the parameter estimators obtained from multiple imputation method to estimate the parameter estimators based on expectation maximization algorithm. We show that the resulting estimators are asymptotically efficient and converge point-wise for small m values, when the iteration k of the iterative multiple imputation goes to infinity. We evaluate the performance of the new proposed methods through simulation studies. A real data analysis is also conducted to illustrate the new method.
The additive hazards model is one of the most popular regression models for analyzing failure time data, especially when one is interested in the excess risk or risk difference. Although a couple of methods have been ...
详细信息
The additive hazards model is one of the most popular regression models for analyzing failure time data, especially when one is interested in the excess risk or risk difference. Although a couple of methods have been developed in the literature for regression analysis of interval-censored data, a general type of failure time data, they may be complicated or inefficient. Corresponding to this, we present a new maximum likelihood estimation procedure based on the sieve approach and in particular, develop an em algorithm that involves a two-stage data augmentation with the use of Poisson latent variables. The method can be easily implemented and the asymptotic properties of the proposed estimators are established. A simulation study is conducted to assess the performance of the proposed method and indicates that it works well for practical situations. Also the method is applied to a set of interval-censored data from an AIDS cohort study.
The mixture cure model is an extension of standard survival models to analyze survival data with a cured fraction. Many developments in recent years focus on the latency part of the model to allow more flexible modeli...
详细信息
The mixture cure model is an extension of standard survival models to analyze survival data with a cured fraction. Many developments in recent years focus on the latency part of the model to allow more flexible modeling strategies for the distribution of uncured subjects, and fewer studies focus on the incidence part to model the probability of being uncured/cured. We propose a new mixture cure model that employs the support vector machine (SVM) to model the covariate effects in the incidence part of the cure model. The new model inherits the features of the SVM to provide a flexible model to assess the effects of covariates on the incidence. Unlike the existing nonparametric approaches for the incidence part, the SVM method also allows for potentially high-dimensional covariates in the incidence part. Semiparametric models are also allowed in the latency part of the proposed model. We develop an estimation method to estimate the cure model and conduct a simulation study to show that the proposed model outperforms existing cure models, particularly in incidence estimation. An illustrative example using data from leukemia patients is given.
Finite mixtures of regressions have been used to analyze data that come from a heterogeneous population. When more than one response is observed, accommodating a multivariate response can be useful. In this article, w...
详细信息
Finite mixtures of regressions have been used to analyze data that come from a heterogeneous population. When more than one response is observed, accommodating a multivariate response can be useful. In this article, we go a step further and introduce a multivariate extension that includes a latent overlapping cluster indicator variable that allows for potential overdispersion. A generalized mixture of multivariate regressions in connection with the proposed model and a new em algorithm for fitting are provided. In addition, we allow for high-dimensional predictors via shrinkage estimation. This model proves particularly useful in the analysis of complex data like the search for cancer therapeutic biomarkers. We demonstrate this using the genomics of drug sensitivity in cancer resource.
作者:
Zhao, YangUniv Regina
Dept Math & Stat CollegeWest 307-14 Regina SK S4S 0A2 Canada
In the literature of statistical analysis with missing data there is a significant gap in statistical inference for missing data mechanisms especially for nonmonotone missing data, which has essentially restricted the...
详细信息
In the literature of statistical analysis with missing data there is a significant gap in statistical inference for missing data mechanisms especially for nonmonotone missing data, which has essentially restricted the use of the estimation methods which require estimating the missing data mechanisms. For example, the inverse probability weighting methods (Horvitz & Thompson, 1952;Little & Rubin, 2002), including the popular augmented inverse probability weighting (Robins et al, 1994), depend on sufficient models for the missing data mechanisms to reduce estimation bias while improving estimation efficiency. This research proposes a semiparametric likelihood method for estimating missing data mechanisms where an em algorithm with closed form expressions for both E-step and M-step is used in evaluating the estimate (Zhao et al, 2009;Zhao, 2020). The asymptotic variance of the proposed estimator is estimated from the profile score function. The methods are general and robust. Simulation studies in various missing data settings are performed to examine the finite sample performance of the proposed method. Finally, we analysis the missing data mechanism of Duke cardiac catheterization coronary artery disease diagnostic data to illustrate the method.
暂无评论