检索结果-内蒙古大学图书馆

Ensemble feature selection: Homogeneous and heterogeneous approaches

KNOWLEDGE-BASED SYSTEMS 2017年 118卷 124-139页

作者： Seijo-Pardo, B. Porto-Diaz, I. Bolon-Canedo, V. Alonso-Betanzos, A. Univ A Coruna Dept Comp Sci Campus Elvina S-N La Coruna 15071 Spain

In the last decade, ensemble learning has become a prolific discipline in pattern recognition, based on the assumption that the combination of the output of several models obtains better results than the output of any individual model. On the basis that the same principle can be applied to feature selection, we describe two approaches: (i) homogeneous, i.e., using the same feature selection method with different training data and distributing the dataset over several nodes;and (ii) heterogeneous, i.e., using different feature selection methods with the same training data. Both approaches are based on combining rankings of features that contain all the ordered features. The results of the base selectors are combined using different combination methods, also called aggregators, and a practical subset is selected according to several different threshold values (traditional values based on fixed percentages, and more novel automatic methods based on data complexity measures). In testing using a Support Vector Machine as a classifier, ensemble results for seven datasets demonstrate performance that is at least comparable and often better than the performance of individual feature selection methods. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Ensemble learning Feature selection Ranking aggregation Classification SVM-Rank data complexity measures

来源：评论

学校读者我要写书评

暂无评论

On the use of different base classifiers in multiclass problems

引用

PROGRESS IN ARTIFICIAL INTELLIGENCE 2017年第4期6卷 315-323页

作者： Moran-Fernandez, L. Bolon-Canedo, V. Alonso-Betanzos, A. Univ A Coruna Comp Sci Dept Lab Res & Dev Artificial Intelligence LIDIA La Coruna 15071 Spain

Classification problems with more than two classes can be handled in different ways. The most used approach is the one which transforms the original multi-class problem into a series of binary subproblems which are solved individually. In this approach, should the same base classifier be used on all binary subproblems? Or should these subproblems be tuned independently? Trying to answer this question, in this paper we propose a method to select a different base classifier in each subproblem-following the one-versus-one strategy-making use of data complexity measures. The experimental results on 17 real-world datasets corroborate the adequacy of the method.

关键词： Multiclass classification Base classifier Binarization techniques One-versus-one data complexity measures

来源：评论

学校读者我要写书评

暂无评论

Noisy data Set Identification

Noisy Data Set Identification

引用

8th International Conference on Hybrid Artificial Intelligent Systems (HAIS)

作者： Garcia, Luis Paulo F. de Carvalho, Andre C. P. L. F. Lorena, Ana C. Univ Sao Paulo Inst Math & Comp Sci Dept Comp Sci Sao Paulo Brazil Univ Fed Sao Paulo Inst Sci & Technol Sao Jose SP Brazil

ISBN: (纸本)9783642408465

Real data are often corrupted by noise, which can be provenient from errors in data collection, storage and processing. The presence of noise hampers the induction of Machine Learning models from data, which can have their predictive or descriptive performance impaired, while also making the training time longer. Moreover, these models can be overly complex in order to accomodate such errors. Thus, the identification and reduction of noise in a data set may benefit the learning process. In this paper, we thereby investigate the use of data complexity measures to identify the presence of noise in a data set. This identification can support the decision regarding the need of the application of noise redution techniques.

关键词： Noisy data Noise identification data complexity measures

来源：评论

学校读者我要写书评

暂无评论

On the Performance of Oversampling Techniques for Class Imbalance Problems 24th

On the Performance of Oversampling Techniques for Class Imba...

引用

24th Pacific-Asia Conference on Knowledge Discovery and data Mining (PAKDD)

作者： Kong, Jiawen Rios, Thiago Kowalczyk, Wojtek Menzel, Stefan Back, Thomas Leiden Univ Leiden Netherlands Honda Res Inst Europe GmbH Offenbach Germany

ISBN: (纸本)9783030474362;9783030474355

Although over 90 oversampling approaches have been developed in the imbalance learning domain, most of the empirical study and application work are still based on the "classical" resampling techniques. In this paper, several experiments on 19 benchmark datasets are set up to study the efficiency of six powerful oversampling approaches, including both "classical" and new ones. According to our experimental results, oversampling techniques that consider the minority class distribution (new ones) perform better in most cases and RACOG gives the best performance among the six reviewed approaches. We further validate our conclusion on our real-world inspired vehicle datasets and also find applying oversampling techniques can improve the performance by around 10%. In addition, seven data complexity measures are considered for the initial purpose of investigating the relationship between data complexity measures and the choice of resampling techniques. Although no obvious relationship can be abstracted in our experiments, we find Flv value, a measure for evaluating the overlap which most researchers ignore, has a strong negative correlation with the potential AUC value (after resampling).

关键词： Class imbalance Minority class distribution data complexity measures

来源：评论

学校读者我要写书评

暂无评论

Machine Learning Predicts Reach-Scale Channel Types From Coarse-Scale Geospatial data in a Large River Basin

引用

WATER RESOURCES RESEARCH 2020年第3期56卷 e2019WR026691-e2019WR026691页

作者： Guillon, Herve Byrne, Colin F. Lane, Belize A. Solis, Samuel Sandoval Pasternack, Gregory B. Univ Calif Davis Dept Land Air & Water Resources Davis CA 95616 USA Utah State Univ Dept Civil & Environm Engn Logan UT 84322 USA

Hydrologic and geomorphic classifications have gained traction in response to the increasing need for basin-wide water resources management. Regardless of the selected classification scheme, an open scientific challenge is how to extend information from limited field sites to classify tens of thousands to millions of channel reaches across a basin. To address this spatial scaling challenge, this study leverages machine learning to predict reach-scale geomorphic channel types using publicly available geospatial data. A bottom-up machine learning approach selects the most accurate and stable model among similar to 20,000 combinations of 287 coarse geospatial predictors, preprocessing methods, and algorithms in a three-tiered framework to (i) define a tractable problem and reduce predictor noise, (ii) assess model performance in statistical learning, and (iii) assess model performance in prediction. This study also addresses key issues related to the design, interpretation, and diagnosis of machine learning models in hydrologic sciences. In an application to the Sacramento River basin (California, USA), the developed framework selects a Random Forest model to predict 10 channel types previously determined from 290 field surveys over 108,943 two hundred-meter reaches. Performance in statistical learning is reasonable with a 61% median cross-validation accuracy, a sixfold increase over the 10% accuracy of the baseline random model, and the predictions coherently capture the large-scale geomorphic organization of the landscape. Interestingly, in the study area, the persistent roughness of the topography partially controls channel types and the variation in the entropy-based predictive performance is explained by imperfect training information and scale mismatch between labels and predictors.

关键词： machine learning geomorphic classification predictive modeling terrain analysis data complexity measures fractal dimension

来源：评论

学校读者我要写书评

暂无评论

The Effect of Dual Hyperparameter Optimization on Software Vulnerability Prediction Models

引用

E-INFORMATICA SOFTWARE ENGINEERING JOURNAL 2023年第1期17卷 1-32页

作者： Bassi, Deepali Singh, Hardeep Guru Nanak Dev Univ Dept Comp Sci Amritsar India

Background: Prediction of software vulnerabilities is a major concern in the field of software security. Many researchers have worked to construct various software vulnerability prediction (SVP) models. The emerging machine learning domain aids in building effective SVP models. The employment of data balancing/resampling techniques and optimal hyperparameters can upgrade their performance. Previous research studies have shown the impact of hyperparameter optimization (HPO) on machine learning algorithms and data balancing ***: The current study aims to analyze the impact of dual hyperparameter optimization on metrics-based SVP ***: This paper has proposed the methodology using the python framework Optuna that optimizes the hyperparameters for both machine learners and data balancing techniques. For the experimentation purpose, we have compared six combinations of five machine learners and five resampling techniques considering default parameters and optimized ***: Additionally, the Wilcoxon signed-rank test with the Bonferroni correction method was implied, and observed that dual HPO performs better than HPO on learners and HPO on data balancers. Furthermore, the paper has assessed the impact of data complexity measures and concludes that HPO does not improve the performance of those datasets that exhibit high ***: The experimental analysis unveils that dual HPO is 64% effective in enhancing the productivity of SVP models.

关键词： software vulnerability hyperparameter optimization machine learning algorithm data balancing techniques data complexity measures

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：