Extreme learning machine (ELM) as a neural network algorithm has shown its good performance, such as fast speed, simple structure etc, but also, weak robustness is an unavoidable defect in original ELM for blended dat...
详细信息
Extreme learning machine (ELM) as a neural network algorithm has shown its good performance, such as fast speed, simple structure etc, but also, weak robustness is an unavoidable defect in original ELM for blended data. We present a new machine learning framework called "larsEN-ELM" to overcome this problem. In our paper, we would like to show two key steps in larsEN-ELM. In the first step, preprocessing, we select the input variables highly related to the output using least angle regression (lars). In the second step, training, we employ Genetic algorithm (GA) based selective ensemble and original ELM. In the experiments, we apply a sum of two sines and four datasets from UCI repository to verify the robustness of our approach. The experimental results show that compared with original ELM and other methods such as OP-ELM, GASEN-ELM and LSBoost, larsEN-ELM significantly improves robustness performance while keeping a relatively high speed. (C) 2014 Elsevier B.V. All rights reserved.
We study how correlations in the design matrix influence Lasso prediction. First, we argue that the higher the correlations, the smaller the optimal tuning parameter. This implies in particular that the standard tunin...
详细信息
We study how correlations in the design matrix influence Lasso prediction. First, we argue that the higher the correlations, the smaller the optimal tuning parameter. This implies in particular that the standard tuning parameters, that do not depend on the design matrix, are not favorable. Furthermore, we argue that Lasso prediction works well for any degree of correlations if suitable tuning parameters are chosen. We study these two subjects theoretically as well as with simulations.
We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator...
详细信息
We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows us to take advantage of the recently developed quick LASSO algorithm to compute the empirical Bayes estimate, and provides a new way to select the tuning parameter in the LASSO method. Unlike previous empirical Bayes variable selection methods, which in most practical situations can be implemented only through a greedy stepwise algorithm, our method gives a global solution efficiently. Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed compared with other variable selection and estimation methods.
We study the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedo...
详细信息
We study the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso-a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria-C-p, AIC and BIC-are available, which, along with the lars algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.
The implied volatility surface (IVS) is a mapping of implied volatilities of options as a function of the moneyness and time-to-maturity. Capturing and forecasting the dynamics of these surface can contribute to tradi...
详细信息
The implied volatility surface (IVS) is a mapping of implied volatilities of options as a function of the moneyness and time-to-maturity. Capturing and forecasting the dynamics of these surface can contribute to trading and hedging strategies, as it contains information about the expected market volatility. I propose both a two-step approach based on principal components analysis (PCA) and a one-step approach based on a state-space model. Exogenous variables are included at a later stage to improve the predictions. The most relevant factors and exogenous variables are selected by the least angle regressions (lars) algorithm. There are three main findings. First, both of the approaches capture the predictability of the IVS of equity options in the information technology (IT) sector. The one-step approach is the best performing averaged over time for each individual equity option in terms of performance evaluation measures. Second, the consideration of exogenous variables improves the predictions of both approaches in terms of statistical performance. The expected inflation and interest rates are found to contribute the most to the prediction of the IVS of equity options. Third, when disregarding transaction costs, the one-step ahead forecasts of both approaches are able to produce positive returns for an ATM straddle trading strategy. However, when accounting for transaction costs, these potential profits disappear.
Integrating multiple databases that are distributed among different data owners can be beneficial in numerous contexts of statistical analysis. Unfortunately, the actual sharing of data is often impeded by concerns ab...
详细信息
Integrating multiple databases that are distributed among different data owners can be beneficial in numerous contexts of statistical analysis. Unfortunately, the actual sharing of data is often impeded by concerns about data confidentiality. A situation like this requires tools that can produce correct results while minimizing risk of disclosure. Over the past ten years a number of "secure'' protocols have been proposed to solve specific statistical problems such as linear regression and classification in a distributed setting. In this thesis, we first explore the disclosure risks associated with several existing protocols designed for the vertically partitioned database setting. We focus on the specific case where two parties are trying to perform logistic regression without actually combining their data. Although the protocols can be considered secure in the sense that there is no danger for either party's data to be fully exposed, there is information leakage resulting from the intermediate computations and also from the estimated coefficients. We provide detailed analysis of such cases. Secondly we show how these previously proposed secure computation protocols can be applied to penalize regression methods, with a focus on the lars algorithm used to do Lasso regression. A protocol for the vertically partitioned database setting is described, along with a thorough discussion on possible disclosure risks and computation. We also provide a detailed description on how to perform model selection and possible ways to expand our protocol to lars-type algorithms for generalized linear models, such as logistic regression.
We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of repres...
详细信息
We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p>n case. An algorithm called lars-EN is proposed for computing elastic net regularization paths efficiently, much like algorithmlars does for the lasso.
暂无评论