We consider the task of multiple-output regression where both input and output are high-dimensional. Due to the limited amount of training samples compared to data dimensions, properly imposing loose statistical depen...
详细信息
We consider the task of multiple-output regression where both input and output are high-dimensional. Due to the limited amount of training samples compared to data dimensions, properly imposing loose statistical dependency in learning a regression model is crucial for reliable prediction accuracy. The sparse inverse covariance learning of conditional Gaussian random fields has been recently emerging to achieve this goal, shown to exhibit superior performance to non-sparse approaches. However, one of its main drawbacks is the strong assumption of linear Gaussianity in modeling the input-output relationship. For certain application domains, the assumption might be too restricted and less powerful in representation, and consequently, prediction based on the wrong models can result in suboptimal performance. In this paper, we extend the idea of sparse learning to a non-Gaussian model, especially the powerful conditional Gaussian mixture. For this latent-variable model, we propose a novel sparse inverse covariance learning algorithm based on the expectation-maximization lower-bound optimization technique. It is shown that each M-step reduces to solving the regular sparse inverse covariance estimation of linear Gaussian models, in conjunction with estimating sparse logistic regression. We demonstrate the improved prediction performance of the proposed algorithm over exisitng methods on several datasets.
In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Ther...
详细信息
In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in a latent variable model, in which both the interesting signal and the noise are mediated through the same latent factors. Under this assumption, the signal model then borrows strength from the noise model by encouraging similar effects on correlated targets. We introduce a hyperparameter for the latent signal-to-noise ratio which turns out to be important for modelling weak signals, and an ordered infinite dimensional shrinkage prior that resolves the rotational unidentifiability in reduced-rank regression models. Simulations and prediction experiments with metabolite, gene expression, FMRI measurement, and macroeconomic time series data show that our model equals or exceeds the state-of-the-art performance and, in particular, outperforms the standard approach of assuming independent noise and signal models.
This article extends linear quantile regression to an elliptical multiple-output regression setup. The definition of the proposed concept leads to a convex optimization problem. Its elementary properties, and the cons...
详细信息
This article extends linear quantile regression to an elliptical multiple-output regression setup. The definition of the proposed concept leads to a convex optimization problem. Its elementary properties, and the consistency of its sample counterpart, are investigated. An empirical application is provided. (C) 2015 Elsevier E.V. All rights reserved.
In this paper, we propose a new method to learn the regression coefficient matrix for multiple-output regression, which is inspired by multi-task learning. We attempt to incorporate high-order structure information am...
详细信息
ISBN:
(纸本)9781479952083
In this paper, we propose a new method to learn the regression coefficient matrix for multiple-output regression, which is inspired by multi-task learning. We attempt to incorporate high-order structure information among the regression coefficients into the estimated process of regression coefficient matrix, which is of great importance for multiple-output regression. Meanwhile, we also intend to describe the output structure with noise covariance matrix to assist in learning model parameters. Taking account of the real-world data often corrupted by noise, we place a constraint of minimizing norm on regression coefficient matrix to make it robust to noise. The experiments are conducted on three public available datasets, and the experimental results demonstrate the power of the proposed method against the state-of-the-art methods.
In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Ther...
详细信息
In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in a latent variable model, in which both the interesting signal and the noise are mediated through the same latent factors. Under this assumption, the signal model then borrows strength from the noise model by encouraging similar effects on correlated targets. We introduce a hyperparameter for the latent signal-to-noise ratio which turns out to be important for modelling weak signals, and an ordered infinite-dimensional shrinkage prior that resolves the rotational unidentifiability in reduced-rank regression models. Simulations and prediction experiments with metabolite, gene expression, FMRI measurement, and macroeconomic time series data show that our model equals or exceeds the state-of-the-art performance and, in particular, outperforms the standard approach of assuming independent noise and signal models.
In this work, we present a new approach for jointly performing eQTL mapping and gene network inference while encouraging a transfer of information between the two tasks. We address this problem by formulating it as a ...
详细信息
In this work, we present a new approach for jointly performing eQTL mapping and gene network inference while encouraging a transfer of information between the two tasks. We address this problem by formulating it as a multiple-output regression task in which we aim to learn the regression coefficients while simultaneously estimating the conditional independence relationships among the set of response variables. The approach we develop uses structured sparsity penalties to encourage the sharing of information between the regression coefficients and the output network in a mutually beneficial way. Our model, inverse-covariance-fused lasso, is formulated as a biconvex optimization problem that we solve via alternating minimization. We derive new, efficient optimization routines to solve each convex sub-problem that are based on extensions of state-of-the-art methods. Experiments on both simulated data and a yeast eQTL dataset demonstrate that our approach outperforms a large number of existing methods on the recovery of the true sparse structure of both the eQTL associations and the gene network. We also apply our method to a human Alzheimer's disease dataset and highlight some results that support previous discoveries about the disease.
A procedure relying on linear programming techniques is developed to compute (regression) quantile regions that have been defined recently. In the location case, this procedure allows for computing halfspace depth reg...
详细信息
A procedure relying on linear programming techniques is developed to compute (regression) quantile regions that have been defined recently. In the location case, this procedure allows for computing halfspace depth regions even beyond dimension two. The corresponding algorithm is described in detail, and illustrations are provided both for simulated and real data. The efficiency of a MATLAB implementation of the algorithm is also investigated through extensive simulations. (C) 2010 Elsevier By. All rights reserved.
In the multiple-output regression context, Hallin et al. (Ann Statist 38:635-669, 2010) introduced a powerful data-analytical tool based on regression quantile regions. However, the computation of these regions, that ...
详细信息
In the multiple-output regression context, Hallin et al. (Ann Statist 38:635-669, 2010) introduced a powerful data-analytical tool based on regression quantile regions. However, the computation of these regions, that are obtained by considering in all directions an original concept of directional regression quantiles, is a very challenging problem. Paindaveine and iman (Comput Stat Data Anal 2011b) described a first elegant solution relying on linear programming techniques. The present paper provides another solution based on the fact that the quantile regions can also be computed from a competing concept of projection regression quantiles, elaborated in Kong and Mizera (Quantile tomography: using quantiles with multivariate data 2008) and Paindaveine and iman (J Multivar Anal 2011a). As a by-product, this alternative solution further provides various characteristics useful for statistical inference. We describe in detail the algorithm solving the parametric programming problem involved, and illustrate the resulting procedure on simulated data. We show through simulations that the Matlab implementation of the algorithm proposed in this paper is faster than that from Paindaveine and iman (Comput Stat Data Anal 2011b) in various cases.
The article deals with certain quantile regression methods for vector responses. In particular, it describes weighted and locally polynomial extensions to the projectional quantile regression, discusses their properti...
详细信息
The article deals with certain quantile regression methods for vector responses. In particular, it describes weighted and locally polynomial extensions to the projectional quantile regression, discusses their properties, addresses their computational side, compares their outcome with recent analogous generalizations of the competing multiple-output directional quantile regression, demonstrates a link between the two competing methodologies, complements the results already available in the literature, illustrates the concepts with a few simulated and insightful examples illustrating some of their features, and shows their application to a real financial data set, namely to Forex 1M exchange rates. The real-data example strongly indicates that the presented methods might have a huge impact on the analysis of multivariate time series consisting of two to four dimensional observations.
This paper sheds some new light on projection quantiles Contrary to the sophisticated set analysis used in Kong and Mizera (2008)1131 we adopt a more parametric approach and study the subgradient conditions associated...
详细信息
This paper sheds some new light on projection quantiles Contrary to the sophisticated set analysis used in Kong and Mizera (2008)1131 we adopt a more parametric approach and study the subgradient conditions associated with these quantiles In this setup we introduce Lagrange multipliers which can be interpreted in various interesting ways in particular in a portfolio optimization context The corresponding projection quantile regions were already shown to coincide with the halfspace depth ones in Kong and Mizera (2008) 1131 but we provide here an alternative proof (completely based on projection quantiles) that has the advantage of leading to an exact computation of halfspace depth regions from projection quantiles Above all we systematically consider the regression case which was barely touched in Kong and Mizera (2008) [13] We show in particular that the regression quantile regions introduced in Hallin Paindaveine and Siman (2010) [6 7]can also be obtained from projection (regression) quantiles which may lead to a faster computation of those regions in some particular cases (C) 2010 Elsevier Inc All rights reserved
暂无评论