ensemblelearning (EL) is a machine learning paradigm where multiple learningalgorithms (base learners) are trained to solve the same problem. This study provides a comprehensive evaluation of widely used EL algorith...
详细信息
ensemblelearning (EL) is a machine learning paradigm where multiple learningalgorithms (base learners) are trained to solve the same problem. This study provides a comprehensive evaluation of widely used EL algorithms, including bagging, boosting, and stacking, highlighting their significant advantages in terms of accuracy and generalization of mineral prospectivity mapping (MPM). This study tested mapping of prospectivity for gold deposits in the Qingchengzi Pb-Zn-Ag-Au polymetallic district using single machine learningalgorithms and EL algorithms. According to the critical and favorable geological factors for magmatic-related medium-temperature hydrothermal lode system for gold deposits, five targeting criteria were extracted from multi-source geoscience datasets (i.e., geological map, gravity and magnetic datasets, stream sediment geochemical datasets) for mineral prospectivity mapping. The receiver operating characteristic curve, the area under the curve, and learning curves were used to evaluate the performance of the tested single and ensemble machine learningalgorithms. The results demonstrate that the stacking model, which combines multiple base models for hierarchical feature extraction, achieves the best predictive performance. The concentration-area fractal model was used to outline the prospective areas predicted by the EL algorithms, clarifying areas with very high prospectivity for gold mineralization in the study area.
Turkey's Artvin province is prone to landslides due to its geological structure, rugged topography, and climatic characteristics with intense rainfall. In this study, landslide susceptibility maps (LSMs) of Murgul...
详细信息
Turkey's Artvin province is prone to landslides due to its geological structure, rugged topography, and climatic characteristics with intense rainfall. In this study, landslide susceptibility maps (LSMs) of Murgul district in Artvin province were produced. The study employed tree-based ensemble learning algorithms, namely Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), and eXtreme Gradient Boosting (XGBoost). LSM was performed using 13 factors, including altitude, aspect, distance to drainage, distance to faults, distance to roads, land cover, lithology, plan curvature, profile curvature, slope, slope length, topographic position index (TPI), and topographic wetness index (TWI). The study utilized a landslide inventory consisting of 54 landslide polygons. Landslide inventory dataset contained 92,446 pixels with a spatial resolution of 10 m. Consistent with the literature, the majority of landslide pixels (70% - 64,712 pixels) were used for model training, and the remaining portion (30% - 27,734 pixels) was used for model validation. Overall accuracy, precision, recall, F1-score, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC-ROC) were considered as validation metrics. LightGBM and XGBoost were found to have better performance in all validation metrics compared to other algorithms. Additionally, SHapley Additive exPlanations (SHAP) were utilized to explain and interpret the model outputs. As per the LightGBM algorithm, the most influential factors in the occurrence of landslide in the study area were determined to be altitude, lithology, distance to faults, and aspect, whereas TWI, plan and profile curvature were identified as the least influential factors. Finally, it was concluded that the produced LSMs would provide significant contributions to decision makers in reducing the damages caused by landslides in the study area.
Soil organic carbon (SOC), as the largest carbon pool on the land surface, plays an important role in soil quality, ecological security and the global carbon cycle. Multisource remote sensing data-driven modeling stra...
详细信息
Soil organic carbon (SOC), as the largest carbon pool on the land surface, plays an important role in soil quality, ecological security and the global carbon cycle. Multisource remote sensing data-driven modeling strategies are not well understood for accurately mapping soil organic carbon. Here, we hypothesized that the Sentinel-2 Multispectral Sensor Instrument (MSI) data-driven modeling strategy produced superior outcomes compared to modeling based on Landsat 8 Operational Land Imager (OLI) data due to the finer spatial and spectral resolutions of the Sentinel-2A MSI data. To test this hypothesis, the Ebinur Lake wetland in Xinjiang was selected as the study area. In this study, SOC estimation was carried out using Sentinel-2A and Landsat 8 data, combining climatic variables, topographic factors, index variables and Sentinel-1A data to construct a common variable model for Sentinel-2A data and Landsat 8 data, and a full variable model for Sentinel-2A data, respectively. We utilized ensemble learning algorithms to assess the prediction performance of modeling strategies, including random forest (RF), gradient boosted decision tree (GBDT) and extreme gradient boosting (XGBoost) algorithms. The results show that: (1) The Sentinel-2A model outperformed the Landsat 8 model in the prediction of SOC contents, and the Sentinel-2A full variable model under the XGBoost algorithm achieved the best results R-2 = 0.804, RMSE = 1.771, RPIQ = 2.687). (2) The full variable model of Sentinel-2A with the addition of the red-edge band and red-edge index improved R-2 by 6% and 3.2% over the common variable Landsat 8 and Sentinel-2A models, respectively. (3) In the SOC mapping of the Ebinur Lake wetland, the areas with higher SOC content were mainly concentrated in the oasis, while the mountainous and lakeside areas had lower SOC contents. Our results provide a program to monitor the sustainability of terrestrial ecosystems through a satellite perspective.
This study aims to establish monitoring models for surface heavy metals in mining areas by utilizing multi-source remote sensing data and ensemble learning algorithms. By collecting heavy metal content data from soil ...
详细信息
This study aims to establish monitoring models for surface heavy metals in mining areas by utilizing multi-source remote sensing data and ensemble learning algorithms. By collecting heavy metal content data from soil and crop leaves within the study area, and combining it with data obtained from the Google Earth Engine platform, including Landsat 8, Sentinel-2 spectral data, vegetation indices, and VV and VH polarization information from Sentinel-1, along with terrain factors derived from the Digital Elevation Model such as elevation, hillshade, slope, and aspect, a total of 43 feature indicators were consolidated. Feature importance ranking (FI) and the successive projections algorithm (SPA) feature selection method were employed to filter feature factors, selecting different features for each type of heavy metal. In the soil, the optimal model for predicting Cr and Cd content is AdaBoost-MT, while the optimal model for inverting Zn, As, Hg, and Pb content is FISPA-AdaBoost-MT. In the crops, the optimal model for predicting the content of all six heavy metals is FISPA-AdaBoost-MT. This indicates that the combination of FI and SPA features effectively evaluates the heavy metal content in both soil and crops. Utilizing these multidimensional features, this study combines ensemble learning algorithms with multi-target regression techniques to construct inversion models for six types of heavy metals (Cr, Zn, As, Cd, Hg, and Pb) simultaneously. Based on the optimal prediction models, distribution maps of heavy metals in soil and crops within the study area were generated, achieving comprehensive, multidimensional monitoring of surface heavy metals in mining areas through overlay display.
Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning...
详细信息
Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learningalgorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learningalgorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.
The current research aims to launch effective accounting fraud detection models using imbalanced ensemble learning algorithms for China A-Share listed firms. Based on a sample of 33,544 Chinese firm-year instances fro...
详细信息
The current research aims to launch effective accounting fraud detection models using imbalanced ensemble learning algorithms for China A-Share listed firms. Based on a sample of 33,544 Chinese firm-year instances from 1998 to 2017, this research respectively established one logistic regression and four ensemblelearning classifiers (AdaBoost, XGBoost, CUSBoost, and RUSBoost) by 12 financial ratios and 28 raw financial data. Additionally, we divided the sample into the train and test observations to evaluate the classifiers' out-of-sample performance. In detail, we applied two metrics, namely, Area under the ROC (receiver operating characteristic) curve (AUC) and Area under the Precision-Recall curve (AUPR), to evaluate classifiers' discriminability. In the supplement test, this study put forward an algebraic fused model on the basis of the four ensemblelearning classifiers and introduced the sliding window technique. The empirical results showed that the ensemblelearning classifiers can detect accounting fraud for the imbalanced China A-listed firms far more effectively than the logistic regression model. Moreover, imbalanced ensemblelearning classifiers (CUSBoost and RUSBoost) effectively performed better than the common ensemblelearning models (AdaBoost and XGBoost) in average. The algebraic fused model in the supplement test also obtained the highest average AUC and AUPR among all the employed algorithms. Our results offer firm support for the potential role of Machine learning (ML)-based Artificial Intelligence (AI) approaches in reliably predicting accounting fraud with high accuracy. Similarly, for the Chinese settings, our ML-based AI offers utmost advantage in forecasting accounting fraud. Finally, this paper fills the research gap on the applications of imbalanced ensemblelearning in accounting fraud detection for Chinese listed firms.
Human falling may be due to a violent act, a heart attack or perhaps physical illness. Every year, many old people are being treated for injuries or even die in hospitals which caused by falling. From there, there is ...
详细信息
Human falling may be due to a violent act, a heart attack or perhaps physical illness. Every year, many old people are being treated for injuries or even die in hospitals which caused by falling. From there, there is a long-standing need for a timely and inexpensive system that automatically identifying a falling person and then alert, which reduces the death rate and increases the likelihood of survival. Due to the low accuracy and a lot of faults in the recognition of the systems released in past years, this paper presented human falling detection by using neuro-fuzzy models and ensemble learning algorithms which presented to solve these problems. This showed the influence of the ensemblelearning on performance of neuro-fuzzy models. However, it should be noted that feature selection and extraction methods in processing the dataset have their own impact. In this case, five kinds of feature selection/extraction algorithms are used. Five neuro-fuzzy models are used in this research: normalized radial basis function (NRBF) network, radial basis function (RBF) network, adaptive neuro-fuzzy inference system (ANFIS), local linear model trees (LOLIMOT) and generalized regression neural networks (GRNN). LOLIMOT model, by using correlation based features selection algorithm, reached the highest answer with the accuracy of 0.796. The results of these models are entered into the two ensemble learning algorithms, single majority vote and weighted majority vote, which weighted majority vote algorithm reached an accuracy of 0.87917 by using principal component analysis algorithm, and was the highest answer among of all the models used in this paper.
In the relentless battle against mesothelioma, a rare and aggressive form of cancer with a dire prognosis, this research explores the transformative potential of ensemblelearning techniques, namely Bagging Tree, Rand...
详细信息
To address the "explosive"propagation phenomenon in social networks, we propose a novel hypergraph propagation model that captures the higher-order interaction process between rumor and anti-rumor. By calcul...
详细信息
To address the "explosive"propagation phenomenon in social networks, we propose a novel hypergraph propagation model that captures the higher-order interaction process between rumor and anti-rumor. By calculating the propagation threshold and analyzing the global stability of equilibrium points, our research indicates that the higher-order structure is a component of the propagation threshold, directly affecting the final state of the propagation dynamics. Additionally, we design data-driven control algorithms, which integrates deep neural networks and ensemble learning algorithms, to autonomously seek suboptimal control strategies for rumor within the framework of optimal control theory. This approach enhances the efficiency and adaptability of traditional control methods. Simulation experiments demonstrate that the control algorithm effectively regulates rumor propagation, achieving a control cost deviation of only 0.0016 from the optimal control theory, while substantially improving the control speed compared to conventional methods.
The classification of very high-resolution satellite imagery remains a focal point in remote sensing, attracting increased attention across diverse scientific disciplines. Various classification methods, including pix...
详细信息
The classification of very high-resolution satellite imagery remains a focal point in remote sensing, attracting increased attention across diverse scientific disciplines. Various classification methods, including pixel- and object-based techniques, have been proposed, and their performances and limitations have been discussed in the literature. This paper presents a hybrid method that combines the strengths of pixel- and object-based methods in image classification to minimize errors associated with the segmentation process, particularly under-segmentation errors in object-based image analysis. The core concept behind the method lies in categorizing segmented image objects as either homogeneous or heterogeneous based on their class probability. In this process, the estimated possibilities from the object-based classification model are considered, and segments are designated as homogeneous or heterogeneous using a user-defined threshold. The object-based classification model determines the class labels for homogeneous image objects, while the heterogeneous ones, containing pixels representing different land cover classes, are classified using the pixel-based model. The performance of hybrid classification models, created by varying thresholds, is analysed using high-resolution WorldView-3 and WorldView-2 imagery and compared with pixel- and object-based classification results. For the implementation of image classification methods, Canonical Correlation Forest (CCF), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) were employed. The findings indicated that employing the suggested hybrid strategy with a threshold value selected within a specific range (e.g. between 60% and 80%) and employing a robust classification algorithm that provides class probabilities (e.g. CCF) results in a statistically significant improvement in overall accuracy compared to pixel and object-based methods, with gains of 5% and 4%, respectively. Visual analysis of the
暂无评论