作者:
Nguyen, Duc-VietPark, JihaeLee, HojunHan, TaejunWu, DiUniv Ghent
Ctr Environm & Energy Res Global Campus Incheon 21985 South Korea Univ Ghent
Ctr Adv Proc Technol Urban Resource Recovery CAPTU Dept Green Chem & Technol B-9000 Ghent Belgium Univ Ghent
Dept Anim Sci & Aquat Ecol B-9000 Ghent Belgium Univ Ghent
Bio Environm Sci & Technol BEST Lab Global Campus119-5 Songdomunhwa Ro Incheon 21985 South Korea
Trace heavy metals have a tendency to persist in the effluent of industrial wastewater treatment facilities, leading to toxic effects on downstream water bodies. Traditional assessment methods relied on animal testing...
详细信息
Trace heavy metals have a tendency to persist in the effluent of industrial wastewater treatment facilities, leading to toxic effects on downstream water bodies. Traditional assessment methods relied on animal testing, but ethical concerns have rendered them unacceptable. An alternative solution is to evaluate wastewater toxicity using trophic-level aquatic organisms as bioassays. However, these bioassay methods involve costly and timeconsuming chemical and biological analytical experiments. In this study, an artificial intelligence-powered water quality assessment (AiWA) approach is proposed for predicting industrial effluent ecotoxicity to further enhance the quick and cost-effective ecotoxicity assessment process. Initially, 99 samples were collected from industrial wastewater treatment plants representing 21 different industries in the Republic of Korea. Fourteen parameters were measured, encompassing both physicochemical and ecotoxicological aspects. boosting algorithms, especially extreme gradient boosting (XGBoost) and adaptive boosting (AdaBoost), were employed for model development. XGBoost outperformed AdaBoost in terms of model performance. Feature selection analysis revealed that conductivity, copper, lead, selenium, pH, and zinc concentrations were the most suitable inputs for training the boosting model. The innovated XGBoost-based AiWA model demonstrated significantly higher performance (i.e., up to 80%) compared to conventional models with an R2 value of exceeding 0.94 and root mean square error of 3.5 toxicity unit for predicting the integrated toxicity unit (ITU). Additionally, pH and conductivity emerged as crucial indicators for reflecting ecotoxicity levels. Specially, this case study indicated that non-toxic/directly dischargeable levels (TU <= 1) were achieved when the pH ranged from 6.8 to 8.4 and the conductivity remained below 1651 mu S/cm. These findings are expected to facilitate rapid and cost-effective detection of heavy metal ecotoxici
Heart disease remains a leading cause of mortality worldwide, necessitating accurate and reliable predictive models to aid early diagnosis and treatment. Traditional machine learning methods like LR and DT Classifiers...
详细信息
This study introduces a sophisticated supervised machine learning method for electric theft detection utilizing a customized histogram gradient boosting (HGB) algorithm. Comprehensive preprocessing, including imputati...
详细信息
This study introduces a sophisticated supervised machine learning method for electric theft detection utilizing a customized histogram gradient boosting (HGB) algorithm. Comprehensive preprocessing, including imputation, normalization, outlier management, and resampling, ensures that the time-series data are accurately prepared for analysis. The synthetic minority oversampling technique-edited nearest neighbor (SMOTE-ENN) algorithm corrects class imbalances, preparing the data for the feature optimization stage, in which key features are selected and extracted. The HGB algorithm, enhanced through Bayesian optimization, is central to the training process, resulting in a model that precisely classifies electricity consumption patterns as genuine or fraudulent. The robustness of the model is evaluated against other recognized boosting methods, such as adaptive boosting (ADB), gradient boosting decision tree (GBDT), and LightGBM, alongside various ensemble and traditional machine learning models. Utilizing key performance metrics such as accuracy, F1-score, and area under the curve (AUC) for validation, the proposed model yields very promising results, with 93% accuracy, 95% F1-score, and 98% AUC, outperforming the comparison group under similar dataset and hyperparameter conditions. This underscores the model's potential as a highly accurate tool for combating electricity theft within an advanced metering infrastructure (AMI).
Machine learning (ML) has been applied in civil engineering to predict the compressive strength of concrete with high accuracy. In this paper, five boosting ensemble algorithms, i.e., XGBoost, AdaBoost, GBDT, LightGBM...
详细信息
Machine learning (ML) has been applied in civil engineering to predict the compressive strength of concrete with high accuracy. In this paper, five boosting ensemble algorithms, i.e., XGBoost, AdaBoost, GBDT, LightGBM, and CatBoost, were used to predict the compressive strength of high-performance concrete (HPC). The models were evaluated using performance indicators such as R-2, root mean square error (RMSE), and mean absolute error (MAE). The results showed that the CatBoost model had the highest accuracy with a R-2 (0.970) and a RMSE (2.916). The prediction accuracy of the model was increased through hyperparameter optimization, which got a higher with a R-2 (0.975) and a RMSE (2.863). Meanwhile, the SHapley Additive exPlanations (SHAP) method was used to explain the output results of the optimal model (CatBoost), which generated explainable insights that further revealed the complex relationship between the prediction model parameters. The results showed that AGE, W/B, and W/C had the most impact on high-performance concrete compressive strength (HPCCS) prediction, which was similar to the results of sensitivity analysis. This study provided a theoretical basis and technical guidance for developing the mix design of a new high-performance concrete (HPC) system. In the future, the interpretable results of the model output should be iteratively checked and validated in the actual laboratory in order to provide guidance for engineering practice.
Accurately predicting personal default risk is crucial for financial institutions to manage credit risk effectively. This study conducts a comparative analysis of the performance of boosting algorithms, including AdaB...
详细信息
Accurately predicting personal default risk is crucial for financial institutions to manage credit risk effectively. This study conducts a comparative analysis of the performance of boosting algorithms, including AdaBoost, XGBoost, LightGBM, and CatBoost, in predicting personal defaults. The dataset used in the study comprises 7,542 individual customers collected from Vietnamese commercial banks and financial institutions between 2014 and 2022, with 12 features related to the financial and demographic characteristics of the borrowers. All customer-related information is fully anonymized and encrypted during the data collection process to ensure compliance with research ethics. The predictive models are evaluated based on six criteria: Accuracy, Precision, Sensitivity, Specificity, F1 score, and AUC. The results indicate that the LightGBM model has the best performance, demonstrating the ability to efficiently handle large and complex datasets. Additionally, the study identifies the five most significant factors influencing personal default risk: Monthly Liability, Credit Balance, Credit History Length, Max Credit Limit, and Yearly Income. However, the study's limitations in the size and scope of the dataset may reduce the generalizability of the results when applied to other regions. These findings provide valuable insights that help financial institutions enhance their strategies for managing credit risk effectively.
In today's rapidly evolving world, with ubiquitous access to technology, there are massive amounts of data being generated. This data contains key insights that shape better decision-making. Hence, tools that help...
详细信息
ISBN:
(纸本)9789819713288;9789819713295
In today's rapidly evolving world, with ubiquitous access to technology, there are massive amounts of data being generated. This data contains key insights that shape better decision-making. Hence, tools that help us extract such insights from this data are of the utmost importance. Sentiment analysis is one such tool. It helps us determine the emotions behind a piece of text. Although there are many resources for sentiment analysis in English, resources for Hindi are limited. We aim to remedy this issue with our work where we scrape, annotate, and pre-process our own Hindi review corpus from the field of cinema. We propose a novel methodology to perform Hindi sentiment analysis using various boosting algorithms and create a foundation to aid better model and framework selection for vernacular natural language processing tasks.
Innovations are taking up new roles in all fields. It still has a crucial role in Internet technology, as the ease with which the Internet is available everywhere and accessible from any device has resulted in a slew ...
详细信息
Innovations are taking up new roles in all fields. It still has a crucial role in Internet technology, as the ease with which the Internet is available everywhere and accessible from any device has resulted in a slew of cyber-attacks., A prevalent scenario during and before a pandemic is phishing, which is accomplished by smartly altering the URL as a legitimate one and then redirecting the user to other sites and extracting personal information. The benchmark URL datasets used for the study considering an equal balance between phishing/ malicious URLs and benign/ legitimate URLs. URLs are parsed in this procedure to extract valuable elements that aid in the identification of URL phishing. Our research emphasized using different machine learning boosting algorithms such as Extreme Gradient boosting, Light Gradient boosting, Adaptive boosting, and Gradient boosting and have achieved an accuracy of more than 98% for most of the algorithms considered.
Solar energy has recently become a viable option for desalinating seawater, primarily in arid regions. However, increasing the productivity of solar still by integrating experimental base and modelling methods is stil...
详细信息
Solar energy has recently become a viable option for desalinating seawater, primarily in arid regions. However, increasing the productivity of solar still by integrating experimental base and modelling methods is still subject to prediction errors;therefore, the main objective of this research is to postulate and test boosting algorithms for predicting the efficiency and productivity of the system. Five boosting regressors were deployed and evaluated: categorical boosting, adaptive boosting, extreme gradient boosting, gradient boosting machine, and gradient boosting machine (LightGBM). The proposed regressors are implemented based on the system's actual recorded dataset (consisting of 720 observations). The dataset consists of input variables, which are the wind speed (V), cloud cover, humidity, ambient temperature (T), solar radiation (SR), (T-io), (T-w), (T-v), and (T-t). Also, the output variable is represented by the productivity of the system. The dataset was separated into training (70%) and testing (30%) sets. In order to decrease regressors errors, hyperparameter optimization was employed. Gradientboosting approach provided the best prediction, with 95% R-2 accuracy and 39.57 root mean square error (RMSE) error. The LightGBM technique achieved 94% R-2 accuracy and 40.07 RMSE error in the testing dataset. The results reveal that Gradientboosting outperforms the cascaded forward neural network in predicting system productivity (CFNN).
This research aims to investigate the effects of various dimension reduction methods, namely Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Linear Discriminant Analysis (LDA) on the predi...
详细信息
ISBN:
(纸本)9798350395600;9798350395594
This research aims to investigate the effects of various dimension reduction methods, namely Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Linear Discriminant Analysis (LDA) on the prediction accuracies of the stressful state in the EEG signaling when performing different mental tasks. The dataset used for this research is the SAM-40 dataset. It consists of 40 subjects performing three different mental tasks, i.e. Arithmetic Problem Solving Task, Stroop Color Word Test and Mirror Image Recognition Task. Each task was carried out with 3 trials. The results after applying the different dimension reduction methods of PCA, ICA and LDA to different boosting algorithms were analyzed and compared meticulously. These boosting algorithms are mainly the ensemble techniques of AdaBoostM1 and RUSBoost algorithms. Among all the experimented results shown, the LDA induced boosted classification methods showed the best prediction accuracy result, i.e. around 30% of prediction accuracy improvement.
From a modeling point of view, our work provides a novel approach to better use XGBoost for bank failure prediction, determining the essential technical aspects that can improve the predictive accuracy. Of these techn...
详细信息
From a modeling point of view, our work provides a novel approach to better use XGBoost for bank failure prediction, determining the essential technical aspects that can improve the predictive accuracy. Of these technical aspects, the two crucial factors are assigning correct values to target variables and careful predictor selection (through ANOVA, correlation, information value tests, and weight of evidence). We also highlight that bank failure could be predicted four to five quarters earlier when all predictive signals simultaneously appear. Hence, we strongly suggest using quarterly data instead of yearly data. In addition to practical implications, our present work also contributed to the existing literature. We confirm the results of existing studies that emphasized that XGBoost has strong predictive power (Carmona, Climent, and Momparler (2018)). Moreover, we provide evidence that XGBoost outperforms other models in the same boosting family, including gradient boosting and AdaBoost, through an intensive comparison of predictive power. These contributions might facilitate future work on bank failure prediction.
暂无评论