检索结果-内蒙古大学图书馆

Random forest framework customized to handle highly correlated variables: an extensive experimental study applied to feature selection in genetic data 5

Random forest framework customized to handle highly correlat...

引用

5th IEEE International Conference on data Science and Advanced Analytics (IEEE DSAA)

作者： Sinoquet, Christine Mekhnacha, Kamel Univ Nantes LS2N UMR CNRS 6004 Nantes France Probayes Grenoble France

ISBN: (纸本)9781538650905

The random forest model is a popular framework used in classification and regression. In cases where high correlations exist within the data, it may be beneficial to capture these dependencies through latent variables, for an enhanced use of the random forest framework. In this paper, we present Sylva, the second proposal of a random forest with latent variables after T-Trees, derived from the seminal works of Botta and co-workers (Botta et al., 2008). Sylva is an innovative hybrid approach in which the dynamic generation of latent variables used to learn the random forest is driven by an additional forest model, this time a forest of latent tree models. The latter forest model, a class of Bayesian networks devised in (Mourad et al., 2011), allows a flexible modeling of the dependencies existing within the data. In the comprehensive study reported here, three variants of Sylva, instantiated by different clustering methods (CAST, DBSCAN, Louvain method), are compared to T-Trees using high-dimensional real-world datasets (161 datasets each describing around 5,000 observations and between 5,700 and 39,000 variables) in the context of genetic association studies. We show that T-Trees and Sylva have comparable high predictive powers (aeras under the ROC curves), that lie in range [0.887, 0.961] (T-Trees), and in interval [0.885, 0.979] (over the three Sylva instantiations). Interestingly, T-Trees and Sylva are shown to differ significantly in their importance measure distributions: in Sylva, the importance measure distribution corresponding to top-ranked variables is significantly skewed towards higher values than in T-Trees, which meets the feature selection enhancement objective. This property holds true for the three instantiations of Sylva. In addition, the thorough analysis of the number of top-ranked variables jointly identified by T-Trees and Sylva highlights the possibility to cross-validate the findings, in order to constitute a priorized list of features (e.g.,

关键词： feature selection dimensionality reduction big data modeling highly correlated variables high-dimensional data random forest with latent variables importance measure Bayesian network with latent variables case study genetic association study

来源：评论

学校读者我要写书评

暂无评论

Map-Reduce Decentralized PCA for big data Monitoring and Diagnosis of Faults in High-Speed Train Bearings

Map-Reduce Decentralized PCA for Big Data Monitoring and Dia...

引用

10th IFAC Symposium on Advanced Control of Chemical Processes (ADCHEM)

作者： Liu, Qiang Kong, Dezhi Qin, S. Joe Xu, Quan Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China Univ Southern Calif Mork Family Dept Chem Engn & Mat Sci Los Angeles CA 90089 USA

Real-time fault detection and diagnosis of high speed trains is essential for the operation safety. Traditional methods mainly employ rule-based alarms to detect faults when the measured single variable deviates too far from the expected range, with multivariate data correlations ignored. In this paper, a Map-Reduce decentralized PCA algorithm and its dynamic extension are proposed to deal with the large amount of data collected from high speed trains. In addition, the Map-Reduce algorithm is implemented in a Hadoop-based big data platform. The experimental results using real high-speed train operation data demonstrate the advantages and effectiveness of the proposed methods for five faulty cases. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： big data modeling Decentralized Principal Component Analysis Fault Diagnosis High-Speed Train Operation Safety

来源：评论

学校读者我要写书评

暂无评论

Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018年第1期29卷 10-24页

作者： Chen, C. L. Philip Liu, Zhulin Univ Macau Fac Sci & Technol Dept Comp & Informat Sci Macau 99999 Peoples R China Dalian Maritime Univ Dalian 116026 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100080 Peoples R China

Broad Learning System (BLS) that aims to offer an alternative way of learning in deep structure is proposed in this paper. Deep structure and learning suffer from a time-consuming training process because of a large number of connecting parameters in filters and layers. Moreover, it encounters a complete retraining process if the structure is not sufficient to model the system. The BLS is established in the form of a flat network, where the original inputs are transferred and placed as "mapped features" in feature nodes and the structure is expanded in wide sense in the "enhancement nodes." The incremental learning algorithms are developed for fast remodeling in broad expansion without a retraining process if the network deems to be expanded. Two incremental learning algorithms are given for both the increment of the feature nodes (or filters in deep structure) and the increment of the enhancement nodes. The designed model and algorithms are very versatile for selecting a model rapidly. In addition, another incremental learning is developed for a system that has been modeled encounters a new incoming input. Specifically, the system can be remodeled in an incremental way without the entire retraining from the beginning. Satisfactory result for model reduction using singular value decomposition is conducted to simplify the final structure. Compared with existing deep neural networks, experimental results on the Modified National Institute of Standards and Technology database and NYU NORB object recognition dataset benchmark data demonstrate the effectiveness of the proposed BLS.

关键词： big data big data modeling broad learning system (BLS) deep learning incremental learning random vector functional-link neural networks (RVFLNN) single layer feedforward neural networks (SLFN) singular value decomposition (SVD)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：