Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection...
详细信息
Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection and, consequently, early treatment. Any development for detection or perdition this kind of cancer is important for a better healthy life. Many studies focus on a model with high accuracy in cancer prediction, but sometimes accuracy alone may not always be a reliable metric. This study implies an investigative approach to studying the performance of different machine learning algorithms based on boosting to predict breast cancer focusing on the recall metric. boosting machine learning algorithms has been proven to be an effective tool for detecting medical diseases. The dataset of the University of California, Irvine (UCI) repository has been utilized to train and test the model classifier that contains their attributes. The main objective of this study is to use state-of-the-art boostingalgorithms such as AdaBoost, XGBoost, CatBoost and LightGBM to predict and diagnose breast cancer and to find the most effective metric regarding recall, ROC-AUC, and confusion matrix. Furthermore, previous studies have applied Optuna to individual algorithms like XGBoost or LightGBM, but no prior research has collectively examined all four boostingalgorithms within a unified Optuna framework, a library for hyperparameter optimization, and the SHAP method to improve the interpretability of our model, which can be used as a support to identify and predict breast cancer. We were able to improve AUC or recall for all the models and reduce the False Negative for AdaBoost and LigthGBM the final AUC were more than 99.41% for all models..
This research proposes a Meta learning framework for financial time series forecasting, designed to rapidly adapt to novel market conditions with minimal retraining. The framework operates in two stages: 1) pretrainin...
详细信息
This research proposes a Meta learning framework for financial time series forecasting, designed to rapidly adapt to novel market conditions with minimal retraining. The framework operates in two stages: 1) pretraining on a diverse set of financial datasets, including stocks (e.g., MSFT, AAPL) and cryptocurrencies (e.g., BTC, ETH), and 2) fine-tuning on recent data to adapt to new markets. The model utilizes XGBoost with dynamic feature engineering, which adjusts technical indicators (e.g., Relative Strength Index, Bollinger Bands) to account for evolving market conditions. Experimental results demonstrate that the proposed framework achieves significant improvements in Root Mean Squared Error (15%) and Mean Absolute Percentage Error (10%) compared to traditional methods, such as simple moving averages and exponential smoothing. These findings highlight the framework's robustness, scalability, and ability to manage dynamic market behaviors, making it an effective tool for both short-term traders and long-term investors. Compared to LSTM-GARCH, the proposed Meta learning model achieves an RMSE of 0.82 (versus up to 10.11), an MAE of 0.61 (versus up to 8.39), and a DA of 67.33% (versus up to 50.44%).
Machine learning (ML) provides effective solutions to develop efficient intrusion detection system (IDS) for various environments. In the present paper, a diversified study of various ensemble machine learning (ML) al...
详细信息
Machine learning (ML) provides effective solutions to develop efficient intrusion detection system (IDS) for various environments. In the present paper, a diversified study of various ensemble machine learning (ML) algorithms has been carried out to propose design of an effective and time-efficient IDS for Internet of Things (IoT) enabled environment. In this paper, data captured from network traffic and real-time sensors of the IoT-enabled smart environment has been analyzed to classify and predict various types of network attacks. The performance of Logistic Regression, Random Forest, Extreme gradientboosting, and Light gradientboosting Machine classifiers have been benchmarked using an open-source largely imbalanced dataset 'DS2OS' that consists of 'normal' and 'anomalous' network traffic. An intrusion detection model "LGB-IDS" has been proposed using the LGBM library of ML after validating its superiority over other algorithms using ensemble techniques and on the basis of majority voting. The performance of the proposed intrusion detection system is suitably validated using certain performance metrics of machine learning such as train and test accuracy, time efficiency, error-rate, true-positive rate (TPR), and false-negative rate (FNR). The experimental results reveal that XGB and LGBM have almost equal accuracy, but the time efficiency of LGBM is much better than RF, and XGB classifiers. The main objective of the present paper is to propose a design of an efficient intrusion detection model with high accuracy, better time efficiency, and reduced false alarm rate. The experimental results show that the proposed model achieves an accuracy of 99.92% and the time efficiency comes to be much higher than other prevalent algorithms-based models. The threat detection rate is greater than 90% and less than 100%. Time complexity of LGBM is also very much low as compared to other ML algorithms.
Solar power forecasting is of high interest in managing any power system based on solar energy. In the case of photovoltaic (PV) systems, and building integrated PV (BIPV) in particular, it may help to better operate ...
详细信息
Solar power forecasting is of high interest in managing any power system based on solar energy. In the case of photovoltaic (PV) systems, and building integrated PV (BIPV) in particular, it may help to better operate the power grid and to manage the power load and storage. Power forecasting directly based on PV time series has some advantages over solar irradiance forecasting first and PV power modeling afterwards. In this paper, the power forecasting for BIPV systems in a vertical facade is studied using machine learning algorithms based on decision trees. The forecasting scheme employs the skforecast library from the Python environment, which facilitates the implementation of different schemes for both deterministic and probabilistic forecasting applications. Firstly, deterministic forecasting of hourly BIPV power was performed with XGBoost and Random Forest algorithms for different cases, showing an improvement in forecasting accuracy when some exogenous variables were used. Secondly, probabilistic forecasting was performed with XGBoost combined with the Bootstrap method. The results of this paper show the capabilities of Random Forest and gradient boosting algorithms, such as XGBoost, to work as regressors in time series forecasting of BIPV power. Mean absolute error in the deterministic forecast, using the most influencing exogenous variables, were around 40% and close below 30% for the south and east array, respectively.
We performed several machine-learning algorithms on a geochemical dataset including whole-rock (n = 1656) and glass (n = 1092) compositions of lavas and pyroclastics belonging to 8 volcanic fields along the South Aege...
详细信息
We performed several machine-learning algorithms on a geochemical dataset including whole-rock (n = 1656) and glass (n = 1092) compositions of lavas and pyroclastics belonging to 8 volcanic fields along the South Aegean Active Volcanic Arc (SAAVA). We did not only test our trained model with the unknown distal tephras, but also controlled its performance using some known distal tephras (e.g., Nisyros-Kyra) from the easternmost part of the SAAVA. The different metrics and kappa values revealed that Naive Bayes, Linear Discriminant Analysis, Artificial Neural Network, and Support Vector Machine (both probabilistic and non-probabilistic models) were the least performing algorithms;while the Random Forest and the gradient boosting algorithms (e.g., CatBoost, LightGBM) together with their average ensemble (Voting Classifier) were the best for the volcanic-source predictions of tephras. This also indicates that the latter algorithms give better results for the machine-learning applications on an imbalanced geochemical dataset, which was the main artifact in our training model. Despite the accurate prediction and training models especially for those having larger datasets (i.e., Santorini and Nisyros volcanoes), we here would like to express that the machine-learning can be as yet a time-saving tool (not an automatized decision-maker) in the tephrochronology studies providing a more efficient and rapid way of finding the possible volcanic sources for unknown tephras. In this regard, our freely-available Python codes would be easily implemented in further "tephra-hunting" studies in and around the SAAVA. However, there is a need for increasing the available geochemical (e.g., mineral chemistry) and also other interrelated datasets (e.g., geochronology) that should be as yet evaluated manually by the tephrochronologists to be able to improve the performances of machine-learning algorithms in the volcanic-source predictions.
Establishing a universal machine learning (ML) model in structural engineering is vital for understanding how various parameters, like geometry and material properties, influence a structure's behavior. This study...
详细信息
Establishing a universal machine learning (ML) model in structural engineering is vital for understanding how various parameters, like geometry and material properties, influence a structure's behavior. This study aims to create a comprehensive ML model that considers the impact of different cross-sectional parameters on the ultimate load capacity (ULC) of concrete-filled steel tube (CFST) columns. This model assists engineers in making informed design decisions. The study employs a large dataset of 3094 data points with diverse geometric and material properties of CFST columns. After adjusting input features, robust boosting ML models (Catboost, LightGBM, and XGB) are meticulously fine-tuned using grid search and fivefold cross-validation. Monte Carlo simulation is used for further assessment. The results demonstrate that the most accurate XGB model delivers impressive accuracy, comparable to or better than existing literature models that focused on a single CFST column cross-section. The chosen XGB model is then utilized for feature importance analysis, local performance assessment, and sensitivity analysis through 1-D and 2-D partial dependence plots. These analyses help assess the input's contribution and effect on ULC prediction for CFST columns.
Accurate aboveground biomass (AGB) estimations over large areas are essential for assessing carbon stocks and forest resources. This study evaluated machine learning approaches for AGB modeling in Pakistan 's moun...
详细信息
Accurate aboveground biomass (AGB) estimations over large areas are essential for assessing carbon stocks and forest resources. This study evaluated machine learning approaches for AGB modeling in Pakistan 's mountainous region of Diamir district using freely available Sentinel-1 and Sentinel-2 data and 171 field-measured AGB training points. Random Forest, gradient Tree boosting, CatBoost, LightGBM, and XGBoost algorithms were implemented and optimized. Models were developed using individual and combined datasets. Sentinel-2 optical data outperformed Sentinel-1 radar data, but the fusion of both sensors achieved the highest accuracy (R2 > 0.7, RMSE = 105.64 Mg/ha, MAE = 85.34 Mg/ha). Tree canopy height was the most informative predictor for this data, besides terrain variables and radar textures. The machine learning models significantly improved AGB estimates compared to traditional regression techniques, and gradient boosters outperformed Random Forest. This research demonstrates the potential of multi-sensor remote sensing data and advanced algorithms for forest biomass mapping in complex terrain, with modeling accuracies reaching root mean squared errors below 90 Mg/ha. The framework provides an effective solution for monitoring biomass using freely available satellite data. Further refinements include integrating higher-resolution optical data and additional field samples for better validation. This study contributes to remote sensing capabilities for assessing vegetation carbon stocks and dynamics.
The performance of four tree-based classification techniques classification and regression trees (CART), multi-adaptive regression splines (MARS), random forests (RF) and gradientboosting trees (GBT) were compared ag...
详细信息
The performance of four tree-based classification techniques classification and regression trees (CART), multi-adaptive regression splines (MARS), random forests (RF) and gradientboosting trees (GBT) were compared against the commonly used logistic regression (LR) analysis to assess aquifer vulnerability in the Ogallala Aquifer of Texas. The results indicate that the tree-based models performed better than the logistic regression model, as they were able to locally refine nitrate exceedance probabilities. RF exhibited the best generalizable capabilities. The CART model did better in predicting non-exceedances. Nitrate exceedances were sensitive to well depths an indicator of aquifer redox conditions, which, in turn, was controlled by alkalinity increases brought forth by the dissolution of calcium carbonate. The clay content of soils and soil organic matter, which serve as indicators of agriculture activities, were also noted to have significant influences on nitrate exceedances. Likely nitrogen releases from confined animal feedlot operations in the northeast portions of the study area also appeared to be locally important. Integrated soil, hydrogeological and geochemical datasets, in conjunction with tree-based methods, help elucidate processes controlling nitrate exceedances. Overall, tree-based models offer flexible, transparent approaches for mapping nitrate exceedances, identifying underlying mechanisms and prioritizing monitoring activities.
In this article, we present a machine learning-based method to locate lightning flashes using calculations of lightning-induced voltages on a transmission line. The proposed approach takes advantage of the preinstalle...
详细信息
In this article, we present a machine learning-based method to locate lightning flashes using calculations of lightning-induced voltages on a transmission line. The proposed approach takes advantage of the preinstalled voltage measurement systems on power transmission lines to get the data. Hence, it does not require the installation of additional sensors such as extremely low frequency, very low frequency, or very high frequency. The proposed model is shown to yield reasonable accuracy in estimating two-dimensional geolocations for lightning strike points for different grid sizes up to 100 x 100 km(2). The algorithm is shown to be robust against the distance between the voltage sensors, lightning peak current, lightning current rise time, and signal to noise ratio of the input signals.
暂无评论