In this study, we have compared manual machine learning with automated machine learning (AutoML) to see which performs better in predictive analysis. Using data from past football matches, we tested a range of algorit...
详细信息
In this study, we have compared manual machine learning with automated machine learning (AutoML) to see which performs better in predictive analysis. Using data from past football matches, we tested a range of algorithms to forecast game outcomes. By exploring the data, we discovered patterns and team correlations, then cleaned and prepped the data to ensure the models had the best possible inputs. Our findings show that AutoML, especially when using logistic regression can outperform manual methods in prediction accuracy. The big advantage of AutoML is that it automates the tricky parts, like data cleaning, feature selection, and tuning model parameters, saving time and effort compared to manual approaches, which require more expertise to achieve similar results. This research highlights how AutoML can make predictive analysis easier and more accurate, providing useful insights for many fields. Future work could explore using different data types and applying these techniques to other areas to show how adaptable and powerful machine learning can be.
One of the key functions of global water resource management authorities is river water quality (WQ) assessment. A water quality index (WQI) is developed for water assessments considering numerous quality-related vari...
详细信息
One of the key functions of global water resource management authorities is river water quality (WQ) assessment. A water quality index (WQI) is developed for water assessments considering numerous quality-related variables. WQI assessments typically take a long time and are prone to errors during sub-indices generation. This can be tackled through the latest machine learning (ML) techniques renowned for superior accuracy. In this study, water samples were taken from the wells in the study area (North Pakistan) to develop WQI prediction models. Four standalone algorithms, i.e., random trees (RT), random forest (RF), M5P, and reduced error pruning tree (REPT), were used in this study. In addition, 12 hybrid data-mining algorithms (a combination of standalone, bagging (BA), cross-validation parameter selection (CVPS), and randomizable filtered classification (RFC)) were also used. Using the 10-fold cross-validation technique, the data were separated into two groups (70:30) for algorithm creation. Ten random input permutations were created using Pearson correlation coefficients to identify the best possible combination of datasets for improving the algorithm prediction. The variables with very low correlations performed poorly, whereas hybrid algorithms increased the prediction capability of numerous standalone algorithms. Hybrid RT-Artificial Neural Network (RT-ANN) with RMSE = 2.319, MAE = 2.248, NSE = 0.945, and PBIAS = -0.64 outperformed all other algorithms. Most algorithms overestimated WQI values except for BA-RF, RF, BA-REPT, REPT, RFC-M5P, RFC-REPT, and ANN-Adaptive Network-Based Fuzzy Inference System (ANFIS).
The purpose of this study was to develop quantitative structure-activity relationship models for N-benzoylindazole derivatives as inhibitors of human neutrophil elastase. These models were developed with the aid of cl...
详细信息
The purpose of this study was to develop quantitative structure-activity relationship models for N-benzoylindazole derivatives as inhibitors of human neutrophil elastase. These models were developed with the aid of classification and regression trees (CART) and an adaptive neuro-fuzzy inference system (ANFIS) combined with a shuffling cross-validation technique using interpretable descriptors. More than one hundred meaningful descriptors, representing various structural characteristics for all 51 N-benzoylindazole derivatives in the data set, were calculated and used as the original variables for shuffling CART modelling. Five descriptors of average Wiener index, Kier benzene-likeliness index, subpolarity parameter, average shape profile index of order 2 and folding degree index selected by the shuffling CART technique have been used as inputs of the ANFIS for prediction of inhibition behaviour of N-benzoylindazole derivatives. The results of the developed shuffling CART-ANFIS model compared to other techniques, such as genetic algorithm (GA)-partial least square (PLS)-ANFIS and stepwise multiple linear regression (MLR)-ANFIS, are promising and descriptive. The satisfactory results (r(p)(2) = 0.845, Q(LOO)(2) = 0.861, r(L25%O)(2) = 0.829, RMSELOO = 0.305 and RMSEL25%O = 0.336) demonstrate that shuffling CART-ANFIS models present the relationship between human neutrophil elastase inhibitor activity and molecular descriptors, and they yield predictions in excellent agreement with the experimental values.
The convergence of measure estimates in the sense of Kullback-Leibler divergence is required in many applications in decision and information theory. Recently, modified histograms have been shown to have good properti...
详细信息
The convergence of measure estimates in the sense of Kullback-Leibler divergence is required in many applications in decision and information theory. Recently, modified histograms have been shown to have good properties with respect to information divergences. For these estimates deterministic optimal bandwidths have been given, but no automatic smoothing procedure has been shown to be asymptotically optimal. In the present article, we consider the Kullback-Leibler cross-validation method for selecting the bin width of modified histograms. We analyze the behavior of the Kullback-Leibler divergence and of its expectation and prove that the cross-validated estimate is asymptotically optimal with respect to the Kullback-Leibler divergence.
暂无评论