Recently, brain -computer interfaces (BCIs) and brainmachine interfaces have garnered the attention of researchers. Based on connections with external devices, external computers and machines can be controlled by brai...
详细信息
Recently, brain -computer interfaces (BCIs) and brainmachine interfaces have garnered the attention of researchers. Based on connections with external devices, external computers and machines can be controlled by brain signals measured via near -infrared spectroscopy (NIRS) or electroencephalograph devices. Herein, we propose a novel bagging algorithm that generates interpolation data around misclassified data using a possibilistic function, to be applied to BCIs. In contrast to AdaBoost, which is a conventional ensemble learning method that increases the weight of misclassified data to incorporate them with high probability to the next datasets, we generate interpolation data using a membership function centered on misclassified data and incorporate them into the next datasets simultaneously. The interpolated data are known as virtual data herein. By adding the virtual data to the training data, the volume of the training data becomes sufficient for adjusting the discriminate boundary more accurately. Because the membership function is defined as a possibility distribution, this method is named the bagging algorithm based on the possibility distribution. Herein, we formulate a bagging -type ensemble learning based on the possibility distribution and discuss the usefulness of the proposed method for solving simple calculations using NIRS data.
Mastering disease influence factors promises to advance clinical research and provides a possible decision making. In this paper, we propose a framework ARB, which is integrating association rule mining algorithm with...
详细信息
Mastering disease influence factors promises to advance clinical research and provides a possible decision making. In this paper, we propose a framework ARB, which is integrating association rule mining algorithm with bagging algorithm. ARB consists of two main modules 1) knowledge discovery and 2) disease diagnosis. Firstly association rule mining algorithm is used to investigate the sick and healthy factors which contribute to disease for males and females. This also aims to select the most robust and effective features to reduce the dimensions. And then we use ensemble algorithm to diagnose disease based on the data filtered by the first module. The framework ARB applies three real thyroid datasets in UCI machine learning repository. Though the association rules generated by Apriori algorithm, we know thyroid disease have different effects on people of different age intervals, and the elderly from 60 to 80 are the most likely to suffer from thyroid disease. The results also show that the two age intervals (30, 40] and (50, 60] are the age intervals with the highest recurrence rate of thyroid disease. And for gender factor, men have more chances of being free from thyroid disease than women. For women in their twenties, they have less risk. After that, we use thyroid disease knowledge from these rules as the input of model for diagnosing thyroid disease. The experimental results significantly show that the performance of ARB outperforms others, which also shows the feasibility and practical value of the framework ARB in thyroid aided diagnosis.
Integrated learning can be used to combine weak classifiers in order to improve the effect of emotional classification. Existing methods of emotional classification on micro-blogs seldom consider utilizing integrated ...
详细信息
Integrated learning can be used to combine weak classifiers in order to improve the effect of emotional classification. Existing methods of emotional classification on micro-blogs seldom consider utilizing integrated learning. Personality can significantly influence user expressions but is seldom accounted for in emotional classification. In this study, a micro-blog emotion classification method is proposed based on a personality and bagging algorithm (PBAL). Introduce text personality analysis and use rule-based personality classification methods to divide five personality types. The micro-blog text is first classified using five personality basic emotion classifiers and a general emotion classifier. A long short-term memory language model is then used to train an emotion classifier for each set, which are then integrated together. Experimental results show that compared with traditional sentiment classifiers, PBAL has higher accuracy and recall. The F value has increased by 9%.
Time series forecasting is valuable in making informed decisions, improving financial planning, optimizing resource allocation, increasing operational efficiency, and managing risks. But accurate forecasting is diffic...
详细信息
From the perspective of clinical decision-making in a Medical IoT-based healthcare system, achieving effective and efficient analysis of long-term health data for supporting wise clinical decision-making is an extreme...
详细信息
From the perspective of clinical decision-making in a Medical IoT-based healthcare system, achieving effective and efficient analysis of long-term health data for supporting wise clinical decision-making is an extremely important objective, but determining how to effectively deal with the multi-dimensionality and high volume of generated data obtained from Medical IoT-based healthcare systems is an issue of increasing importance in IoT healthcare data exploration and management. A novel classifier or predicator equipped with a good feature selection function contributes effectively to classification and prediction performance. This paper proposes a novel bagging C4.5 algorithm based on wrapper feature selection, for the purpose of supporting wise clinical decision-making in the medical and healthcare fields. In particular, the new proposed sampling method, S-C4.5 SMOTE, is not only able to overcome the problem of data distortion, but also improves overall system performance because its mechanism aims at effectively reducing the data size without distortion, by keeping datasets balanced and technically smooth. This achievement directly supports the Wrapper method of effective feature selection without the need to consider the problem of huge amounts of data;this is a novel innovation in this work.
The learning effect of students is crucial for assessing teaching quality, thus playing a significant role in teaching management. Predicting student achievement is a major challenge in understanding the learning effe...
详细信息
The learning effect of students is crucial for assessing teaching quality, thus playing a significant role in teaching management. Predicting student achievement is a major challenge in understanding the learning effect of students. Currently, many studies have utilized machine learning methods such as the decision tree algorithms C4.5, ID3, CART, J48, random forest, and others. However, few studies have explored the use of the bagging algorithm in this field. Therefore, this study proposes a classification prediction method for student achievement based on the bagging-CART algorithm. Initially, the student achievement data is preprocessed, and the Apriori method is applied to mine the strongly associated dataset. The optimal hyper-parameters are determined through grid search to train and predict the bagging-CART algorithm. Furthermore, the CART, J48, and bagging-CART algorithms are trained, and their evaluation indicators are compared using a confusion matrix. The results indicate that the bagging-CART model achieves an accuracy of 98.16%, a recall rate of 91.80%, a precision of 90.83%, and an F1 score of 94.87%. In comparison, the accuracy, precision, and F1 scores are higher than those obtained with CART and J48. Although the recall rate is slightly lower than that of CART by 0.26%, it is 0.52% higher than that of J48. Consequently, this method demonstrates strong predictive capabilities and introduces a new reference method for evaluating students' learning effect.
Introduction: Ovarian cancer (OC) is one of the most frequent gynecologic cancers among women, and high-accuracy risk prediction techniques are essential to effectively select the best intervention strategies and clin...
详细信息
Introduction: Ovarian cancer (OC) is one of the most frequent gynecologic cancers among women, and high-accuracy risk prediction techniques are essential to effectively select the best intervention strategies and clinical management for OC patients at different risk levels. Current risk prediction models used in OC have low sensitivity, and few of them are able to identify OC patients at high risk of mortality, which would both optimize the treatment of high-risk patients and prevent unnecessary medical intervention in those at low risk. Objectives: To this end, we have developed a bagging-based algorithm with GA-XGBoost models that predicts the risk of death from OC using gene expression profiles. Methods: Four gene expression datasets from public sources were used as training (n = 1) or validation (n = 3) sets. The performance of our proposed algorithm was compared with fine-tuning and other existing methods. Moreover, the biological function of selected genetic features was further interpreted, and the response to a panel of approved drugs was predicted for different risk levels. Results: The proposed algorithm showed good sensitivity (74-100%) in the validation sets, compared with two simple models whose sensitivity only reached 47% and 60%. The prognostic gene signature used in this study was highly connected to AKT, a key component of the PI3K/AKT/mTOR signaling pathway, which influences the tumorigenesis, proliferation, and progression of OC. Conclusion: These findings demonstrated an improvement in the sensitivity of risk classification of OC patients with our risk prediction models compared with other methods. Ongoing effort is needed to val-idate the outcomes of this approach for precise clinical treatment. (C) 2020 The Authors. Published by Elsevier B.V. on behalf of Cairo University.
bagging algorithm has been proven to be effective when dealing with on different classification problems. However, the success of bagging depends strongly on the diversity level reached by the individual classifiers o...
详细信息
ISBN:
(纸本)9781424496365
bagging algorithm has been proven to be effective when dealing with on different classification problems. However, the success of bagging depends strongly on the diversity level reached by the individual classifiers of the ensemble models. Diversity in ensemble can be obtained when the individual classifiers are built using different circumstances, such as parameter settings, training datasets and learning algorithms. This paper presents a new approach which combines these three different ways to obtain high diversity in bagging models, aiming, as a consequence, to obtain high levels of accuracy for the ensembles. In the proposed approach, in order to obtain the optimal configurations of features and classifiers in bagging models, we have applied an evolutionary approach composed of two genetic algorithm instances. In order to validate the proposed approach, experiments involving 10 classification algorithms have been conducted, applying the resulting bagging structures in 5 pattern classification datasets taken from the UCI repository. In addition, we analyze the performance of the resulting bagging structures in terms of two recently proposed diversity measures, referred to as good and bad.
The infrastructure industry utilizes a significant number of natural resources and produces a lot of construction waste, both of which have negative environmental effects. As a solution, recycled aggregate concrete ha...
详细信息
The infrastructure industry utilizes a significant number of natural resources and produces a lot of construction waste, both of which have negative environmental effects. As a solution, recycled aggregate concrete has emerged as a practical substitute. Predicting strength accurately is essential for cutting design time and expenses while limiting material waste from numerous mixing tests. Machine learning methods tackle structural engineering issues, including the prediction of Splitting Tensile Strength (STS). In this study, used four novel machine learning models such as Random Forest Regression (RFR), Extreme Gradient Boosting (XGBoost), Gradient Boosted Regression Trees (GBRT), and bagging Regressor (BR) with grid search for hyperparameter tuning to forecast the splitting tensile strength of fiber-reinforced recycled aggregate concrete (FRRAC). The machine learning models demonstrated high reliability in predicting splitting tensile strength, including robust values for R-squared (R 2 ), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE). The prediction performance of the GBRT models showed the greatest R 2 value of 0.95 during the training stage and R 2 value of 0.83 during the testing phase. The XGBoost, RFR and BR models found R-square values were 0.822, 0.781 and 0.824 at the testing phase, respectively. Moreover, the RFR, BR, GBRT, and XGBoost model RMSE values were found to be 0.333, 0.298, 0.276, and 0.3004 at the testing phase, respectively, where the GBRT model RMSE value was found to be good. The GBRT model showed the lowest uncertainty value of both phases, with values of 0.619 and 0.597 for the training and testing phases, respectively. Furthermore, SHapley Additive exPlanations (SHAP) analysis found that CR, and additional of Fiber were the most influential input features and replacement percentage of CR (%) and RCA Absorption capacity (%) inputs had the lowest impact of Fiber-Reinforced Recycled Aggr
The weighted sampling methods based on k-nearest neighbors have been demonstrated to be effective in solving the class imbalance problem. However,they usually ignore the positional relationship between a sample and th...
详细信息
The weighted sampling methods based on k-nearest neighbors have been demonstrated to be effective in solving the class imbalance problem. However,they usually ignore the positional relationship between a sample and the heterogeneous samples in its neighborhood when calculating sample weight. This paper proposes a novel neighborhood-weighted based sampling method named NWBbagging to improve the bagging algorithm's performance on imbalanced datasets. It considers the positional relationship between the center sample and the heterogeneous samples in its neighborhood when identifying critical samples. And a parameter reduction method is proposed and combined into the ensemble learning framework, which reduces the parameters and increases the classifier's diversity. We compare NWBbagging with some state-of-the-art ensemble learning algorithms on 34 imbalanced datasets, and the result shows that NWBbagging achieves better performance.
暂无评论