The random forest model is a popular framework used in classification and regression. In cases where high correlations exist within the data, it may be beneficial to capture these dependencies through latent variables...
详细信息
ISBN:
(纸本)9781538650905
The random forest model is a popular framework used in classification and regression. In cases where high correlations exist within the data, it may be beneficial to capture these dependencies through latent variables, for an enhanced use of the random forest framework. In this paper, we present Sylva, the second proposal of a random forest with latent variables after T-Trees, derived from the seminal works of Botta and co-workers (Botta et al., 2008). Sylva is an innovative hybrid approach in which the dynamic generation of latent variables used to learn the random forest is driven by an additional forest model, this time a forest of latent tree models. The latter forest model, a class of Bayesian networks devised in (Mourad et al., 2011), allows a flexible modeling of the dependencies existing within the data. In the comprehensive study reported here, three variants of Sylva, instantiated by different clustering methods (CAST, DBSCAN, Louvain method), are compared to T-Trees using high-dimensional real-world datasets (161 datasets each describing around 5,000 observations and between 5,700 and 39,000 variables) in the context of genetic association studies. We show that T-Trees and Sylva have comparable high predictive powers (aeras under the ROC curves), that lie in range [0.887, 0.961] (T-Trees), and in interval [0.885, 0.979] (over the three Sylva instantiations). Interestingly, T-Trees and Sylva are shown to differ significantly in their importance measure distributions: in Sylva, the importance measure distribution corresponding to top-ranked variables is significantly skewed towards higher values than in T-Trees, which meets the feature selection enhancement objective. This property holds true for the three instantiations of Sylva. In addition, the thorough analysis of the number of top-ranked variables jointly identified by T-Trees and Sylva highlights the possibility to cross-validate the findings, in order to constitute a priorized list of features (e.g.,
Real-time fault detection and diagnosis of high speed trains is essential for the operation safety. Traditional methods mainly employ rule-based alarms to detect faults when the measured single variable deviates too f...
详细信息
Real-time fault detection and diagnosis of high speed trains is essential for the operation safety. Traditional methods mainly employ rule-based alarms to detect faults when the measured single variable deviates too far from the expected range, with multivariate data correlations ignored. In this paper, a Map-Reduce decentralized PCA algorithm and its dynamic extension are proposed to deal with the large amount of data collected from high speed trains. In addition, the Map-Reduce algorithm is implemented in a Hadoop-based bigdata platform. The experimental results using real high-speed train operation data demonstrate the advantages and effectiveness of the proposed methods for five faulty cases. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Broad Learning System (BLS) that aims to offer an alternative way of learning in deep structure is proposed in this paper. Deep structure and learning suffer from a time-consuming training process because of a large n...
详细信息
Broad Learning System (BLS) that aims to offer an alternative way of learning in deep structure is proposed in this paper. Deep structure and learning suffer from a time-consuming training process because of a large number of connecting parameters in filters and layers. Moreover, it encounters a complete retraining process if the structure is not sufficient to model the system. The BLS is established in the form of a flat network, where the original inputs are transferred and placed as "mapped features" in feature nodes and the structure is expanded in wide sense in the "enhancement nodes." The incremental learning algorithms are developed for fast remodeling in broad expansion without a retraining process if the network deems to be expanded. Two incremental learning algorithms are given for both the increment of the feature nodes (or filters in deep structure) and the increment of the enhancement nodes. The designed model and algorithms are very versatile for selecting a model rapidly. In addition, another incremental learning is developed for a system that has been modeled encounters a new incoming input. Specifically, the system can be remodeled in an incremental way without the entire retraining from the beginning. Satisfactory result for model reduction using singular value decomposition is conducted to simplify the final structure. Compared with existing deep neural networks, experimental results on the Modified National Institute of Standards and Technology database and NYU NORB object recognition dataset benchmark data demonstrate the effectiveness of the proposed BLS.
暂无评论