检索结果-内蒙古大学图书馆

ADDPC-SMOTE: An oversampling algorithm Based on Density Difference Peak Clustering and Spatial Distribution Entropy

IEEE ACCESS 2023年 11卷 108152-108166页

作者： Wang, Wei Liu, Fen Guilin Tourism Univ Business Sch Guilin 541006 Peoples R China

Most of the existing oversampling algorithms based on clustering do not consider the spatial distribution of Majority class, and it is easy to overlap classes and ignore important information points when synthesizing new samples. To solve this problem, this paper analyzes the influence of the spatial distribution on the oversampling process, and proposes an oversampling algorithm based on Adaptive Density Difference Peak Clustering and Spatial Distribution Entropy. Firstly, the spatial distribution situation of two classes samples is introduced into the clustering process, and the local density difference is used to cluster of Minority class by the peak value, so as to achieve scientific and reasonable selection of sub-cluster centers and reduce the occurrence of class overlap. At the same time, the method of determining the truncation distance according to the previous experience is change. The spatial distribution situation of two classes samples is characterized by constructing Spatial Distribution Entropy. On this basis, the automatic selection and optimization of truncation distance are realized. Then the boundary points and sparse points are screened according to the absolute value of local density difference, and the sampling probabilities of each minority class sample are determined to focus on these important information points. Finally, Spatial Distribution Entropy is used to evaluate the synthetic samples set to ensure that they can balance the distribution of the two classes samples in the dataset. To test the effectiveness of the algorithm, five oversampling algorithms are used to perform comparative experiments on four classifiers and 16 common datasets. The results show that compared with SMOTE, K-means-SMOTE, BS-SMOTE, ADASYN, DPC-SMOTE, the algorithm has significantly improved in all evaluation indexes.

关键词： Density difference peak clustering spatial distribution entropy oversampling algorithm class overlap

来源：评论

学校读者我要写书评

暂无评论

Improved CBSO: A distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data

引用

INFORMATION SCIENCES 2021年 569卷 70-89页

作者： Dai, Feifan Song, Yan Si, Weiyun Yang, Guisong Hu, Jianhua Wang, Xinli Univ Shanghai Sci & Technol Dept Control Sci & Engn Shanghai 200093 Peoples R China Univ Shanghai Sci & Technol Sch Sci Shanghai 200093 Peoples R China

Imbalanced data problem is a big challenge for judicial data analysis since it often leads to a low accuracy of the data classification. Synthesizing new samples by means of oversampling is a useful method to handle this problem. However, most oversampling algorithms have been obtained regardless of noise samples and the data distribution has not been fully taken into consideration. For this purpose, an improved cluster-based synthetic oversampling algorithm, namely distributed fuzzy-based adaptive synthetic oversampling (DFBASO) algorithm, is proposed by simultaneously considering the distribution of interclass, the distribution of intra-cluster and the characteristic of noise samples. The proposed DFBASO algorithm is equipped with: 1) fuzzy c-means (FCM) clustering algorithm application for samples of minority and majority classes;2) weighted distribution based on two factors including the inter-class distance and the cluster capacity;and 3) a mixed synthetic method under different distribution cases of intra-cluster. Finally, the judicial data set and eight public data sets are utilized to show the effectiveness and universal applicability of the proposed DFBASO algorithm for the imbalanced data classification. (c) 2021 Elsevier Inc. All rights reserved. With the arrival of the big data era and the rapid improvement of data acquisition systems, the judicial data analysis has become a hot research topic and gained much attention in both academic and applicable fields. In the judicial research frontier, big data analysis is usually combined with artificial intelligent algorithms so as to help organizations and mechanisms have access to blind spots of problems, make an improvement of the trial efficiency and the judicial justice, and accelerate the establishment of intelligent trial system. So far, a great effort has been made on the classification of judicial data and some remarkable results have been reported in the literature, see 129,19,34,2] and references ther

关键词： Imbalanced judicial data Inter-class distance The cluster capacity Mixed synthetic method Distributed fuzzy-based adaptive synthetic oversampling algorithm

来源：评论

学校读者我要写书评

暂无评论

VCOS: A Novel Synergistic oversampling algorithm in Binary Imbalance Classification

引用

IEEE ACCESS 2019年 7卷 145435-145443页

作者： Zhang, Chunkai Zhou, Ting Deng, Yepeng Harbin Inst Technol Sch Comp Sci & Technol Shenzhen 518000 Guangdong Peoples R China Peng Cheng Lab Shenzhen 518000 Guangdong Peoples R China

Learning from class-imbalanced data is a challenging problem as standard classification algorithms are designed to handle balanced class distributions. Scholars solve this problem by modifying classifiers or and generating artificial data by oversampling. The former usually design corresponding classifier to adapt them to the imbalanced data, while the latter exploits the sampling algorithm, which are the data preprocessing steps independent of the classifier. In this paper, we propose a novel synergistic oversampling algorithm to combine the oversampling and classification into one without training the classifier repeatedly, which can generate new pertinent samples according to the classification performance of the classifier without repeat training or deep understanding of the classifier, so the generated samples can guarantee the performance improvement of the classifier. Moreover, The proposed framework enclosures the oversampling method without traditional parameters in oversampling methods. Experimental results on several real-life imbalanced datasets demonstrate the effectiveness and efficiency of the proposed algorithm in binary classification problems.

关键词： Imbalanced classification oversampling algorithm variational auto-encoder synergistic architecture expected classifier

来源：评论

学校读者我要写书评

暂无评论

A study on the characteristics of applying oversampling algorithms to Fosberg Fire-Weather Index (FFWI) data

引用

SMART STRUCTURES AND SYSTEMS 2024年第1期34卷 9-15页

作者： Kim, Sang Yeob Lee, Dongsoo Yu, Jung-Doung Yoon, Hyung-Koo Konkuk Univ Dept Fire & Disaster Prevent 268 Chungwon daero Chungju Si 27478 Chungcheongbuk South Korea Korea Univ Sch Civil Environm & Architectural Engn 145 Anam ro Seoul 02841 South Korea Joongbu Univ Dept Civil Engn Goyang 10279 South Korea Daejeon Univ Dept Construct & Disaster Prevent Engn 62 Daehak ro Daejeon 34520 South Korea

oversampling algorithms are methods employed in the field of machine learning to address the constraints associated with data quantity. This study aimed to explore the variations in reliability as data volume is progressively increased through the use of oversampling algorithms. For this purpose, the synthetic minority oversampling technique (SMOTE) and the borderline synthetic minority oversampling technique (BSMOTE) are chosen. The data inputs, which included air temperature, humidity, and wind speed, are parameters used in the Fosberg Fire-Weather Index (FFWI). Starting with a base of 52 entries, new data sets are generated by incrementally increasing the data volume by 10% up to a total increase of 100%. This augmented data is then utilized to predict FFWI using a deep neural network. The coefficient of determination (R2) 2 ) is calculated for predictions made with both the original and the augmented datasets. Suggesting that increasing data volume by more than 50% of the original dataset quantity yields more reliable outcomes. This study introduces a methodology to alleviate the challenge of establishing a standard for data augmentation when employing oversampling algorithms, as well as a means to assess reliability.

关键词： Borderline Synthetic Minority oversampling TEchnique (BSMOTE) Deep neural network (DNN) Fosberg Fire Weather Index (FFWI) oversampling algorithm Synthetic Minority oversampling TEchnique (SMOTE)

来源：评论

学校读者我要写书评

暂无评论

Electric theft detection in advanced metering infrastructure using Jaya optimized combined Kernel-Tree boosting classifier-A novel sequentially executed supervised machine learning approach

引用

IET GENERATION TRANSMISSION & DISTRIBUTION 2022年第6期16卷 1257-1275页

作者： Hussain, Saddam Mustafa, Mohd Wazir Al-Shqeerat, Khalil Hamdi Ateyeh Al-rimy, Bander Ali Saleh Saeed, Faisal Univ Teknol Malaysia Sch Elect Engn Johor Baharu 81310 Malaysia Qassim Univ Dept Comp Sci Coll Comp Buraydah Saudi Arabia Univ Teknol Malaysia Sch Comp Fac Engn Johor Baharu 81310 Johor Malaysia Birmingham City Univ Sch Comp & Digital Technol Birmingham W Midlands England

This paper presents a novel, sequentially executed supervised machine learning-based electric theft detection framework using a Jaya-optimized combined Kernel and Tree Boosting (KTBoost) classifier. It utilizes the intelligence of the XGBoost algorithm to estimate the missing values in the acquired dataset during the data pre-processing phase. An oversampling algorithm based on the Robust-SMOTE technique is utilized to avoid the unbalanced data class distribution issue. Afterward, with the aid of few very significant statistical, temporal, and spectral features extracted from the acquired kWh dataset, the complex underlying data patterns are comprehended to enhance the accuracy and detection rate of the classifier. For effectively classifying the consumers into "Honest" and "Fraudster," the ensemble machine learning-based classifier KTBoost, with Jaya algorithm optimized hyperparameters, is utilized. Finally, the developed model is re-trained using a reduced set of highly important features to minimize the computational resources without compromising the performance of the developed model. The outcome of this study reveals that the proposed theft detection method achieves the highest accuracy (93.38%), precision (95%), and recall (93.18%) among all the studied methods, thus signifying its importance in the studied area of research.

关键词： trees (mathematics) machine learning-based electric theft detection framework sampling methods regression analysis Kernel-Tree boosting classifier data pre-processing phase complex underlying data patterns detection rate pattern classification unbalanced data class distribution issue XGBoost algorithm ensemble machine learning-based classifier KTBoost oversampling algorithm significant statistical features feature extraction acquired dataset theft detection method Jaya algorithm spectral features advanced metering infrastructure temporal, features Robust-SMOTE technique power engineering computing learning (artificial intelligence)

来源：评论

学校读者我要写书评

暂无评论

A Novel Region Adaptive SMOTE algorithm for Intrusion Detection on Imbalanced Problem 3

A Novel Region Adaptive SMOTE Algorithm for Intrusion Detect...

引用

3rd IEEE International Conference on Computer and Communications (ICCC)

作者： Yan, BingHao Han, GuoDong Sun, MeiDong Ye, ShengZhao Natl Digital Switching Syst Engn & Technol Res Ct Zhengzhou Henan Peoples R China

ISBN: (纸本)9781509063529

Machine learning techniques play a crucial part in intrusion detection and greatly change the original intrusion detection methods. How to use machine learning technologies to achieve better detection results is important. However, due to defects in the machine learning algorithms and the data imbalance problem between the attack behaviors and the normal behaviors in the network, the detection rate of low-frequent attack behaviors cannot be effectively improved. In order to solve this issue, from the consideration of data level, a novel Region Adaptive Synthetic Minority oversampling Technique (RA-SMOTE) is proposed. Three different types of classifiers, including support vector machines (SVM), BP neural network (BPNN), and random forests (RF), are used to test the effectiveness of the algorithm. Empirical results test on DSL-KDD dataset show that the proposed algorithm can effectively solve the class imbalance problem and improve the detection rate of low-frequent attacks.

关键词： network secyrity intrusion detection machine learning imbalanced dataset oversampling algorithm

来源：评论

学校读者我要写书评

暂无评论

Intrusion Detection System for Industrial Control Systems Based on Imbalanced Data 15

Intrusion Detection System for Industrial Control Systems Ba...

引用

IEEE 15th International Symposium on Autonomous Decentralized Systems (ISADS)

作者： Dong, Xinrui Lai, Yingxu Beijing Univ Technol Fac Informat Technol Beijing Peoples R China Beijing Univ Technol Fac Informat Technol Engn Res Ctr Intelligent Percept & Autonomous Con Minist Educ Beijing Peoples R China

ISBN: (纸本)9781665464512

The integration of industrialization and informatization has exposed industrial control systems (ICSs) to increasingly serious security challenges. Currently, the mainstream method to protect the security of ICSs is intrusion detection system (IDS) based on deep-learning. However, these methods depend on a massive amount of high-quality data. Owing to the characteristics and protocol limitations, ICSs data usually experience low-quality and data imbalance problems, which significantly affects the accuracy of IDS. In this study, an IDS for ICS that combines data expansion algorithm and CNN was proposed. A novel normalized neighborhood weighted convex combined random sample (NNW-CCRS) oversampling algorithm was designed, which automatically attenuates the effects of noise and expanding imbalanced data to produce balanced ICS datasets. By reducing the impact of imbalanced ICS data on IDSs, our system effectively protects the security of ICS. Secure Water Treatment dataset (SWaT) was used for experimental validation. The experimental results confirmed that the accuracy of the proposed system improved by approximately 20%, compared to the ICS without data expansion.

关键词： industrial control system imbalanced data oversampling algorithm

来源：评论

学校读者我要写书评

暂无评论

Application of KM-SMOTE for rockburst intelligent prediction

引用

TUNNELLING AND UNDERGROUND SPACE TECHNOLOGY 2023年 138卷

作者： Liu, Qiushi Xue, Yiguo Li, Guangkun Qiu, Daohong Zhang, Weimeng Guo, Zhuangzhuang Li, Zhiqiang Shandong Univ Geotech & Struct Engn Res Ctr Jinan 250061 Peoples R China China Univ Geosci Beijing Sch Engn & Technol Beijing 100083 Peoples R China

Class-imbalanced is a common phenomenon in rockburst data, and the prediction of rockburst intensity through intelligent methods requires a balanced dataset. This fact presents challenges for standard classification algorithms that are designed for class distributions that are well-balanced. This paper develops the modified synthetic minority oversampling technique by K-means cluster (KM-SMOTE) to reduce the imbalance phenomenon in the rockburst dataset. First, the study collects 226 rockburst cases worldwide as the original supporting dataset and selects four indexes to predict the rockburst intensity, namely, the maximum tangential stress of the surrounding rock & sigma;& theta;, the uniaxial compressive strength of rock & sigma;c, the tensile strength of rock & sigma;t, and the elastic energy index Wet. Second, the KM-SMOTE uses a K-means cluster to cluster the minority-class samples and then performs SMOTE oversampling on each cluster to obtain 388 data. To establish a nonlinear correlation between rockburst intensity and its predictors, six machine-learning classifiers are used. The dataset is randomly divided into training and test sets, with 80% of the data used for training. In the data training and testing phases, the original dataset, SMOTE-processed dataset, and KM-SMOTE-processed dataset were put into the machine learning models for predicting rockburst intensity, where KM-SMOTE was 3.3% and 10.5% more accurate than the SMOTEprocessed dataset in predicting rockburst intensity, respectively. In the Jiangbian Hydropower Station engineering application, the KM-SMOTE algorithm can achieve a maximum improvement of 25% in accuracy compared with the data processed by SMOTE. Overall, the proposed modified oversampling algorithm effectively overcomes class-imbalanced in the rockburst dataset and significantly contributes to the intelligent prediction of rockburst by machine learning in engineering.

关键词： Class-imbalanced Rockburst prediction KM -SMOTE oversampling algorithm Machine learning

来源：评论

学校读者我要写书评

暂无评论

Research on Hard Rock Pillar Stability Prediction Based on SABO-LSSVM Model

引用

APPLIED SCIENCES-BASEL 2024年第17期14卷

作者： Xie, Xuebin Zhang, Huaxi Cent South Univ Sch Resources & Safety Engn Changsha 410083 Peoples R China

The increase in mining depth necessitates higher strength requirements for hard rock pillars, making mine pillar stability analysis crucial for pillar design and underground safety operations. To enhance the accuracy of predicting the stability state of mine pillars, a prediction model based on the subtraction-average-based optimizer (SABO) for hyperparameter optimization of the least-squares support vector machine (LSSVM) is proposed. First, by analyzing the redundancy of features in the mine pillar dataset and conducting feature selection, five parameter combinations were constructed to examine their effects on the performance of different models. Second, the SABO-LSSVM prediction model was compared vertically with classic models and horizontally with other optimized models to ensure comprehensive and objective evaluation. Finally, two data sampling methods and a combined sampling method were used to correct the bias of the optimized model for different categories of mine pillars. The results demonstrated that the SABO-LSSVM model exhibited good accuracy and comprehensive performance, thereby providing valuable insights for mine pillar stability prediction.

关键词： pillar stability prediction SABO-LSSVM ensemble algorithms oversampling algorithm

来源：评论

学校读者我要写书评

暂无评论

Employee Attrition Classification Model Based on Stacking algorithm

引用

Psychology Research 2023年第6期13卷 279-285页

作者： CHEN Yanming LIN Xinyu ZHAN Kunye Shantou University South China Normal University Shenzhen University

This paper aims to build an employee attrition classification model based on the Stacking *** algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and ***,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to ***,the Stacking algorithm is used to establish the final classification *** model has practical and significant implications for both human resource management and employee attrition analysis.

关键词： employee attrition classification model machine learning ensemble learning oversampling algorithm Randomforest stacking algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：