检索结果-内蒙古大学图书馆

Breast Cancer Classification Using gradient boosting algorithms Focusing on Reducing the False Negative and SHAP for Explainability

引用

INTELIGENCIA ARTIFICIAL-IBEROAMERICAN JOURNAL OF ARTIFICIAL INTELLIGENCE 2025年第75期28卷 63-80页

作者： Pinheiro, Joao Manoel Herrera Becker, Marcelo Univ Sao Paulo Sao Carlos Sch Engn Mech Engn Dept BR-13566590 Sao Carlos SP Brazil

Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection and, consequently, early treatment. Any development for detection or perdition this kind of cancer is important for a better healthy life. Many studies focus on a model with high accuracy in cancer prediction, but sometimes accuracy alone may not always be a reliable metric. This study implies an investigative approach to studying the performance of different machine learning algorithms based on boosting to predict breast cancer focusing on the recall metric. boosting machine learning algorithms has been proven to be an effective tool for detecting medical diseases. The dataset of the University of California, Irvine (UCI) repository has been utilized to train and test the model classifier that contains their attributes. The main objective of this study is to use state-of-the-art boosting algorithms such as AdaBoost, XGBoost, CatBoost and LightGBM to predict and diagnose breast cancer and to find the most effective metric regarding recall, ROC-AUC, and confusion matrix. Furthermore, previous studies have applied Optuna to individual algorithms like XGBoost or LightGBM, but no prior research has collectively examined all four boosting algorithms within a unified Optuna framework, a library for hyperparameter optimization, and the SHAP method to improve the interpretability of our model, which can be used as a support to identify and predict breast cancer. We were able to improve AUC or recall for all the models and reduce the False Negative for AdaBoost and LigthGBM the final AUC were more than 99.41% for all models..

关键词： Breast Cancer gradient boosting algorithms Decision Trees UCI dataset Optuna SHAP

来源：评论

学校读者我要写书评

暂无评论

Meta Learning Strategies for Comparative and Efficient Adaptation to Financial Datasets

引用

IEEE ACCESS 2025年 13卷 24158-24170页

作者： Noor, Kubra Fatima, Ubaida NED Univ Engn & technol Dept Math Karachi 75270 Pakistan

This research proposes a Meta learning framework for financial time series forecasting, designed to rapidly adapt to novel market conditions with minimal retraining. The framework operates in two stages: 1) pretraining on a diverse set of financial datasets, including stocks (e.g., MSFT, AAPL) and cryptocurrencies (e.g., BTC, ETH), and 2) fine-tuning on recent data to adapt to new markets. The model utilizes XGBoost with dynamic feature engineering, which adjusts technical indicators (e.g., Relative Strength Index, Bollinger Bands) to account for evolving market conditions. Experimental results demonstrate that the proposed framework achieves significant improvements in Root Mean Squared Error (15%) and Mean Absolute Percentage Error (10%) compared to traditional methods, such as simple moving averages and exponential smoothing. These findings highlight the framework's robustness, scalability, and ability to manage dynamic market behaviors, making it an effective tool for both short-term traders and long-term investors. Compared to LSTM-GARCH, the proposed Meta learning model achieves an RMSE of 0.82 (versus up to 10.11), an MAE of 0.61 (versus up to 8.39), and a DA of 67.33% (versus up to 50.44%).

关键词： Meta-learning techniques financial market prediction gradient boosting algorithms advanced feature selection dynamic predictive models machine learning in finance adaptive forecasting approaches Meta-learning techniques financial market prediction gradient boosting algorithms advanced feature selection dynamic predictive models machine learning in finance adaptive forecasting approaches

来源：评论

学校读者我要写书评

暂无评论

Design of an Intrusion Detection Model for IoT-Enabled Smart Home

引用

IEEE ACCESS 2023年 11卷 52509-52526页

作者： Rani, Deepti Gill, Nasib Singh Gulia, Preeti Arena, Fabio Pau, Giovanni Maharshi Dayanand Univ Dept Comp Sci & Applicat Rohtak 124001 Haryana India Kore Univ Enna Fac Engn & Architecture I-94100 Enna Italy

Machine learning (ML) provides effective solutions to develop efficient intrusion detection system (IDS) for various environments. In the present paper, a diversified study of various ensemble machine learning (ML) algorithms has been carried out to propose design of an effective and time-efficient IDS for Internet of Things (IoT) enabled environment. In this paper, data captured from network traffic and real-time sensors of the IoT-enabled smart environment has been analyzed to classify and predict various types of network attacks. The performance of Logistic Regression, Random Forest, Extreme gradient boosting, and Light gradient boosting Machine classifiers have been benchmarked using an open-source largely imbalanced dataset 'DS2OS' that consists of 'normal' and 'anomalous' network traffic. An intrusion detection model "LGB-IDS" has been proposed using the LGBM library of ML after validating its superiority over other algorithms using ensemble techniques and on the basis of majority voting. The performance of the proposed intrusion detection system is suitably validated using certain performance metrics of machine learning such as train and test accuracy, time efficiency, error-rate, true-positive rate (TPR), and false-negative rate (FNR). The experimental results reveal that XGB and LGBM have almost equal accuracy, but the time efficiency of LGBM is much better than RF, and XGB classifiers. The main objective of the present paper is to propose a design of an efficient intrusion detection model with high accuracy, better time efficiency, and reduced false alarm rate. The experimental results show that the proposed model achieves an accuracy of 99.92% and the time efficiency comes to be much higher than other prevalent algorithms-based models. The threat detection rate is greater than 90% and less than 100%. Time complexity of LGBM is also very much low as compared to other ML algorithms.

关键词： Intrusion detection Classification algorithms Internet of Things Machine learning Telecommunication traffic Machine learning algorithms boosting Machine learning classification algorithms ensemble classifiers gradient boosting algorithms light gradient boosting machines (LGBM) and intrusion detection systems (IDS)

来源：评论

学校读者我要写书评

暂无评论

Exploring the PV Power Forecasting at Building Facades Using gradient boosting Methods

引用

ENERGIES 2023年第3期16卷 1495页

作者： Polo, Jesus Martin-Chivelet, Nuria Alonso-Abella, Miguel Sanz-Saiz, Carlos Cuenca, Jose de la Cruz, Marina CIEMAT Renewable Energy Div Photovolta Solar Energy Unit Avda Complutense 40 Madrid 28040 Spain

Solar power forecasting is of high interest in managing any power system based on solar energy. In the case of photovoltaic (PV) systems, and building integrated PV (BIPV) in particular, it may help to better operate the power grid and to manage the power load and storage. Power forecasting directly based on PV time series has some advantages over solar irradiance forecasting first and PV power modeling afterwards. In this paper, the power forecasting for BIPV systems in a vertical facade is studied using machine learning algorithms based on decision trees. The forecasting scheme employs the skforecast library from the Python environment, which facilitates the implementation of different schemes for both deterministic and probabilistic forecasting applications. Firstly, deterministic forecasting of hourly BIPV power was performed with XGBoost and Random Forest algorithms for different cases, showing an improvement in forecasting accuracy when some exogenous variables were used. Secondly, probabilistic forecasting was performed with XGBoost combined with the Bootstrap method. The results of this paper show the capabilities of Random Forest and gradient boosting algorithms, such as XGBoost, to work as regressors in time series forecasting of BIPV power. Mean absolute error in the deterministic forecast, using the most influencing exogenous variables, were around 40% and close below 30% for the south and east array, respectively.

关键词： BIPV PV power forecasting machine learning gradient boosting algorithms

来源：评论

学校读者我要写书评

暂无评论

Application of machine-learning algorithms for tephrochronology: a case study of Plio-Quaternary volcanic fields in the South Aegean Active Volcanic Arc

引用

EARTH SCIENCE INFORMATICS 2022年第2期15卷 1167-1182页

作者： Uslular, Goksu Kiyikci, Fatih Karaarslan, Enis Kuscu, Gonca Gencalioglu Mugla Sitki Kocman Univ Dept Geol Engn Kotekli Campus TR-48000 Mugla Turkey Univ Geneva Dept Earth Sci Rue Maraichers 13 CH-1205 Geneva Switzerland Mugla Sitki Kocman Univ Dept Comp Engn Kotekli Campus TR-48000 Mugla Turkey

We performed several machine-learning algorithms on a geochemical dataset including whole-rock (n = 1656) and glass (n = 1092) compositions of lavas and pyroclastics belonging to 8 volcanic fields along the South Aegean Active Volcanic Arc (SAAVA). We did not only test our trained model with the unknown distal tephras, but also controlled its performance using some known distal tephras (e.g., Nisyros-Kyra) from the easternmost part of the SAAVA. The different metrics and kappa values revealed that Naive Bayes, Linear Discriminant Analysis, Artificial Neural Network, and Support Vector Machine (both probabilistic and non-probabilistic models) were the least performing algorithms;while the Random Forest and the gradient boosting algorithms (e.g., CatBoost, LightGBM) together with their average ensemble (Voting Classifier) were the best for the volcanic-source predictions of tephras. This also indicates that the latter algorithms give better results for the machine-learning applications on an imbalanced geochemical dataset, which was the main artifact in our training model. Despite the accurate prediction and training models especially for those having larger datasets (i.e., Santorini and Nisyros volcanoes), we here would like to express that the machine-learning can be as yet a time-saving tool (not an automatized decision-maker) in the tephrochronology studies providing a more efficient and rapid way of finding the possible volcanic sources for unknown tephras. In this regard, our freely-available Python codes would be easily implemented in further "tephra-hunting" studies in and around the SAAVA. However, there is a need for increasing the available geochemical (e.g., mineral chemistry) and also other interrelated datasets (e.g., geochronology) that should be as yet evaluated manually by the tephrochronologists to be able to improve the performances of machine-learning algorithms in the volcanic-source predictions.

关键词： Tephrochronology Machine-learning gradient boosting algorithms Imbalanced data South Aegean Active Volcanic Arc

来源：评论

学校读者我要写书评

暂无评论

Universal boosting ML approaches to predict the ultimate load capacity of CFST columns

引用

STRUCTURAL DESIGN OF TALL AND SPECIAL BUILDINGS 2024年第2期33卷 e2071-e2071页

作者： Nguyen, Thuy-Anh Le Nguyen, Khuong Ly, Hai-Bang Univ Transport Technol Hanoi Vietnam Univ Canberra Canberra ACT Australia Univ Transport Technol Hanoi 100000 Vietnam

Establishing a universal machine learning (ML) model in structural engineering is vital for understanding how various parameters, like geometry and material properties, influence a structure's behavior. This study aims to create a comprehensive ML model that considers the impact of different cross-sectional parameters on the ultimate load capacity (ULC) of concrete-filled steel tube (CFST) columns. This model assists engineers in making informed design decisions. The study employs a large dataset of 3094 data points with diverse geometric and material properties of CFST columns. After adjusting input features, robust boosting ML models (Catboost, LightGBM, and XGB) are meticulously fine-tuned using grid search and fivefold cross-validation. Monte Carlo simulation is used for further assessment. The results demonstrate that the most accurate XGB model delivers impressive accuracy, comparable to or better than existing literature models that focused on a single CFST column cross-section. The chosen XGB model is then utilized for feature importance analysis, local performance assessment, and sensitivity analysis through 1-D and 2-D partial dependence plots. These analyses help assess the input's contribution and effect on ULC prediction for CFST columns.

关键词： CFST columns gradient boosting algorithms machine learning ultimate load capacity

来源：评论

学校读者我要写书评

暂无评论

Evaluating machine learning approaches for aboveground biomass prediction in fragmented high-elevated forests using multi-sensor satellite data

引用

REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT 2024年 36卷

作者： Qadeer, Asim Shakir, Muhammad Wang, Li Talha, Syed Muhammad Inst Space Technol Dept Space Sci Islamabad 44000 Pakistan Chinese Acad Sci Aerosp Informat Res Inst State Key Lab Remote Sensing Sci Beijing 100101 Peoples R China

Accurate aboveground biomass (AGB) estimations over large areas are essential for assessing carbon stocks and forest resources. This study evaluated machine learning approaches for AGB modeling in Pakistan 's mountainous region of Diamir district using freely available Sentinel-1 and Sentinel-2 data and 171 field-measured AGB training points. Random Forest, gradient Tree boosting, CatBoost, LightGBM, and XGBoost algorithms were implemented and optimized. Models were developed using individual and combined datasets. Sentinel-2 optical data outperformed Sentinel-1 radar data, but the fusion of both sensors achieved the highest accuracy (R2 > 0.7, RMSE = 105.64 Mg/ha, MAE = 85.34 Mg/ha). Tree canopy height was the most informative predictor for this data, besides terrain variables and radar textures. The machine learning models significantly improved AGB estimates compared to traditional regression techniques, and gradient boosters outperformed Random Forest. This research demonstrates the potential of multi-sensor remote sensing data and advanced algorithms for forest biomass mapping in complex terrain, with modeling accuracies reaching root mean squared errors below 90 Mg/ha. The framework provides an effective solution for monitoring biomass using freely available satellite data. Further refinements include integrating higher-resolution optical data and additional field samples for better validation. This study contributes to remote sensing capabilities for assessing vegetation carbon stocks and dynamics.

关键词： Above -ground biomass Carbon stocks SAR-Optical data fusion Variable selection Hyperparameter optimization gradient boosting algorithms Ensemble learning Google Earth Engine

来源：评论

学校读者我要写书评

暂无评论

Tree-Based Modeling Methods to Predict Nitrate Exceedances in the Ogallala Aquifer in Texas

引用

WATER 2020年第4期12卷 1023页

作者： Uddameri, Venkatesh Silva, Ana Luiza Bessa Singaraju, Sreeram Mohammadi, Ghazal Hernandez, E. Annette Tech Univ Dept Civil Environm & Construct Engn Lubbock TX 79409 USA

The performance of four tree-based classification techniques classification and regression trees (CART), multi-adaptive regression splines (MARS), random forests (RF) and gradient boosting trees (GBT) were compared against the commonly used logistic regression (LR) analysis to assess aquifer vulnerability in the Ogallala Aquifer of Texas. The results indicate that the tree-based models performed better than the logistic regression model, as they were able to locally refine nitrate exceedance probabilities. RF exhibited the best generalizable capabilities. The CART model did better in predicting non-exceedances. Nitrate exceedances were sensitive to well depths an indicator of aquifer redox conditions, which, in turn, was controlled by alkalinity increases brought forth by the dissolution of calcium carbonate. The clay content of soils and soil organic matter, which serve as indicators of agriculture activities, were also noted to have significant influences on nitrate exceedances. Likely nitrogen releases from confined animal feedlot operations in the northeast portions of the study area also appeared to be locally important. Integrated soil, hydrogeological and geochemical datasets, in conjunction with tree-based methods, help elucidate processes controlling nitrate exceedances. Overall, tree-based models offer flexible, transparent approaches for mapping nitrate exceedances, identifying underlying mechanisms and prioritizing monitoring activities.

关键词： aquifer vulnerability machine learning random forests CART MARS gradient boosting algorithms Ogallala Aquifer nitrate water quality

来源：评论

学校读者我要写书评

暂无评论

Machine Learning-Based Lightning Localization Algorithm Using Lightning-Induced Voltages on Transmission Lines

引用

IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY 2020年第6期62卷 2512-2519页

作者： Karami, Hamidreza Mostajabi, Amirhossein Azadifar, Mohammad Rubinstein, Marcos Zhuang, Chijie Rachidi, Farhad Ecole Polytech Fed Lausanne CH-1015 Lausanne Switzerland Bu Ali Sina Univ Dept Elect Engn Hamadan 65178 Hamadan Iran Haute Ecole Ingn & Gest Canton Vaud Inst Informat & Commun Technol CH-1401 Yverdon Switzerland Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China

In this article, we present a machine learning-based method to locate lightning flashes using calculations of lightning-induced voltages on a transmission line. The proposed approach takes advantage of the preinstalled voltage measurement systems on power transmission lines to get the data. Hence, it does not require the installation of additional sensors such as extremely low frequency, very low frequency, or very high frequency. The proposed model is shown to yield reasonable accuracy in estimating two-dimensional geolocations for lightning strike points for different grid sizes up to 100 x 100 km(2). The algorithm is shown to be robust against the distance between the voltage sensors, lightning peak current, lightning current rise time, and signal to noise ratio of the input signals.

关键词： Lightning Sensors Power transmission lines Machine learning algorithms Transmission line measurements Voltage measurement Geology gradient boosting algorithms lightning localization machine learning (ML) transients on transmission lines

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：