Thanks to the improvement of technologies such as Internet of Things, bio-sensing and data mining, smart wearable technologies have recently received increasing attention for teenagers' sport and health monitoring...
详细信息
Thanks to the improvement of technologies such as Internet of Things, bio-sensing and data mining, smart wearable technologies have recently received increasing attention for teenagers' sport and health monitoring. Despite the powerful data-acquisition ability of the current wearable products on the market, they still suffer performance deficiency in valuable knowledge extraction due to the lack of accurate computational model and in-depth data analysis. Based on this, this paper proposes a machine learning based physical fitness evaluation model oriented to wearable running monitoring for teenagers, in which a variant of the gradient boosting machine (GBM) combined with advanced feature selection and bayesian hyper-parameter optimization is employed to build a physical fitness evaluation model. To begin with, we design a special experimental paradigm for data acquisition based on a conventional running activity, in which a group of teenagers' photoplethysmography (PPG) signals in different testing stages are collected by a set of smartbands developed by ourselves. Next, PPG signals are processed in four steps which match with the four modules in the proposed model including signal preprocessing, physiological data estimation, feature engineering and classification modules. Firstly, the signal preprocessing module aims for suppressing noise and removing baseline drift in PPG signals by using a smoothness prior approach (SPA) and a median filter (MF), respectively. Secondly, the physiological data estimation module achieves conversion from PPG signals to physiological data such as heart rate (HR) and blood oxygen saturation (SpO(2)). Thirdly, the feature engineering module extracts from the physiological data a group of key features closely related to physical fitness statuses, and then implements a novel advanced feature selection scheme by using Pearson correlation and importance score ranking based sequential forward search (PC-ISR-SFS). Fourthly, the classificati
Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have ...
详细信息
Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have been recently developed in the credit scoring domain. These methods have proven their superiority in discriminating borrowers accurately. However, among the ensemble models, little consideration has been provided to the following: (1) highlighting the hyper-parameter tuning of base learner despite being critical to well-performed ensemble models;(2) building sequential models (i.e., boosting, as most have focused on developing the same or different algorithms in parallel);and (3) focusing on the comprehensibility of models. This paper aims to propose a sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost)). The model mainly comprises three steps. First, data pre-processing is employed to scale the data and handle missing values. Second, a model-based feature selection system based on the relative feature importance scores is utilized to remove redundant variables. Third, the hyper-parameters of XGBoost are adaptively tuned with bayesian hyper-parameter optimization and used to train the model with selected feature subset. Several hyper-parameteroptimization methods and baseline classifiers are considered as reference points in the experiment. Results demonstrate that bayesian hyper-parameter optimization performs better than random search, grid search, and manual search. Moreover, the proposed model outperforms baseline models on average over four evaluation measures: accuracy, error rate, the area under the curve (AUC) H measure (AUC-H measure), and Brier score. The proposed model also provides feature importance scores and decision chart, which enhance the interpretability of credit scoring model. (C) 2017 Elsevier Ltd. All rights reserved.
A photovoltaic based water pumping system (PWPS) is a promising application specifically for farmers and people living in remote or rural regions that may have limited or no access to the utility grid. However, the wi...
详细信息
A photovoltaic based water pumping system (PWPS) is a promising application specifically for farmers and people living in remote or rural regions that may have limited or no access to the utility grid. However, the wider application of PWPS is limited due to the less efficient utilization of installed photovoltaic (PV) capacity, resulting in a low return on investment. Further, farmers need assistance in deciding the operational status of PWPS due to PV intermittency. Therefore, optimizing PV utilization based on farmers' irrigation and water pumping requirements is essential. In this paper, a data-driven methodology is proposed to optimize PWPS utilization and help farmers make appropriate operational decisions based on water pumping needs and available PV power. A tree ensemble supervised learning-based PV power prediction model has been developed as a first step. To enhance the performance of the PV power prediction model, a bayesian hyper-parameter optimization algorithm has been applied. During the second step, the PV power prediction outcome for the upcoming days serves as input to decide the PWPS operation in coordination with the farmer's observations regarding the water pumping needs. Based on the predicted PV power availability and irrigation/water pumping needs, the reference signal for motor pump operation would be estimated. To validate the performance of the proposed methodology, a case study has been performed, considering different operational scenarios by means of five use cases. A close match between the predicted and actual PV power generation has been observed. Better PV utilization and farm irrigation have been observed as compared to conventional PWPS. Further, the need of a long term test validation is required to analyse the stability and robustness of the developed methodology, specifically for remote/rural regions.
Surfactant-enhanced aquifer remediation (SEAR) is an appropriate method for DNAPL-contaminated aquifer remediation;However, due to the high cost of the SEAR method, finding the optimal remediation scenario is usually ...
详细信息
Surfactant-enhanced aquifer remediation (SEAR) is an appropriate method for DNAPL-contaminated aquifer remediation;However, due to the high cost of the SEAR method, finding the optimal remediation scenario is usually essential. Embedding numerical simulation models of DNAPL remediation within the optimization routines are computationally expensive, and in this situation, using surrogate models instead of numerical models is a proper alternative. Ensemble methods are also utilized to enhance the accuracy of surrogate models, and in this study, the Stacking ensemble method was applied and compared with conventional methods. First, Six machine learning methods were used as surrogate models, and various feature scaling techniques were employed, and their impact on the models' performance was evaluated. Also, Bagging and Boosting homogeneous ensemble methods were used to improve the base models' accuracy. A total of six stand-alone surrogate models and 12 homogeneous ensemble models were used as the base input models of the Stacking ensemble model. Due to the large size of the Stacking model, bayesian hyper-parameter optimization method was used to find its optimal hyper-parameters. The results showed that the bayesian hyper-parameter optimization method had better performance than common methods such as random search and grid search. The artificial neural network model, whose input data was scaled by the power transformer method, had the best performance with a cross-validation RMSE of 0.065. The Boosting method increased the base models' accuracy more than other homogeneous methods, and the best Boosting model had a test RMSE of 0.039. The Stacking ensemble method significantly increased the base models' accuracy and performed better than other ensemble methods. The best ensemble surrogate model constructed with Stacking had a cross-validation RMSE of 0.016. Finally, a differential evolution optimization model was used by substituting the Stacking ensemble model with t
It is widely accepted that conventional boost algorithms are of low efficiency and accuracy in dealing with big data collected from wind turbine operations. To address this issue, this paper is devoted to the applicat...
详细信息
It is widely accepted that conventional boost algorithms are of low efficiency and accuracy in dealing with big data collected from wind turbine operations. To address this issue, this paper is devoted to the application of an adaptive LightGBM method for wind turbine fault detections. To this end, the realization of feature selection for fault detection is firstly achieved by utilizing the maximum information coefficient to analyze the correlation among features in supervisory control and data acquisition (SCADA) of wind turbines. After that, a performance evaluation criterion is proposed for the improved LightGBM model to support fault detections. In this scheme, by embedding the confusion matrix as a performance indicator, an improved LightGBM fault detection approach is then developed. Based on the adaptive LightGBM fault detection model, a fault detection strategy for wind turbine gearboxes is investigated. To demonstrate the applications of the proposed algorithms and methods, a case study with a three-year SCADA dataset obtained from a wind farm sited in Southern China is conducted. Results indicate that the proposed approaches established a fault detection framework of wind turbine systems with either lower false alarm rate or lower missing detection rate.
暂无评论