The UK’s HE system is mired in public debate around ‘grade inflation’, and there is substantial pressure to address the perceived devaluation of degrees through blunt policy measures such as modifying classificatio...
详细信息
Changes in citrus volatile organic compounds (VOCs) induced by Bactrocera dorsalis (Hendel) infestation can serve as characteristic identifiers for non-destructive detection of infested citrus. This study proposed an ...
详细信息
Changes in citrus volatile organic compounds (VOCs) induced by Bactrocera dorsalis (Hendel) infestation can serve as characteristic identifiers for non-destructive detection of infested citrus. This study proposed an innovative method combining colorimetric sensor array (CSA) technology with machine learning algorithms for the discrimination of B. dorsalis infestation in citrus. Gas chromatography-mass spectrometry (GC-MS) analysis identified key VOCs, including d-limonene, linalool, and decanal, as infestation markers. Subsequently, various porphyrin and metalloporphyrin dyes exhibiting sensitivity to these VOCs were selected to construct the CSA. To enhance detection accuracy, a hybrid feature selection method integrating ReliefF and Particle Swarm Optimization (PSO) was implemented. Subsequently, the optimized features subsets were utilized to develop classification models. Specifically, a binary classification model employing the K Nearest Neighbor (KNN) algorithm achieved a high accuracy of 93.89 % in distinguishing between healthy and infected citrus. Furthermore, a multiclass classification model using KNN was developed to differentiate among invasive, incubation, and infestation stages, attaining a remarkable accuracy of 97.78 %. This approach presents a promising solution for early detection of B. dorsalis infestation in citrus.
Cardiovascular disease commonly referred as heart disease, encompasses diverse conditions that the heart undergoes which in turn leads to sudden death or prolonged sickness worldwide over the past decades. More recent...
详细信息
Credit risk assessment is a crucial element in credit risk management. With the extensive research on consumer credit risk assessment in recent decades, the abundance of literature on this topic can be overwhelming fo...
详细信息
Credit risk assessment is a crucial element in credit risk management. With the extensive research on consumer credit risk assessment in recent decades, the abundance of literature on this topic can be overwhelming for researchers. Therefore, this article aims to provide a more systematic and comprehensive analysis from three perspectives: classification algorithms, data traits, and learning methods. Firstly, the state-of-the-art classification algorithms are categorized into traditional single classifiers, intelligent single classifiers, hybrid and ensemble multiple classifiers. Secondly, considering the diversity of data traits in the credit dataset, data traits are divided into external structure information traits, data quality traits, data quantity traits, and internal information traits. Data traits-driven modeling framework based on multiple classifiers is proposed for solving credit risk assessment. Thirdly, considering the differences in data modeling methods, learning methods are classified into data status, label status, and structure form. Furthermore, model interpretability, model bias, model multi-pattern, and model fairness are discussed. Finally, the limitations and future research directions are presented. This review article serves as a helpful guide for researchers and practitioners in the field of credit risk modeling and analysis.
The characteristic substances in citrus volatile organic compounds (VOCs) are associated with the infestation of the Bactrocera dorsalis (Hendel), which provides a noninvasive evaluation method to discriminate infeste...
详细信息
The characteristic substances in citrus volatile organic compounds (VOCs) are associated with the infestation of the Bactrocera dorsalis (Hendel), which provides a noninvasive evaluation method to discriminate infested citrus. This paper developed an olfactory detection system based on a quartz crystal microbalance (QCM) sensor array and classification algorithms to identify the citrus infested with B. dorsalis. Six characteristic substances, including D-limonene, myrcene, alpha-pinene, decanal, linalool, and beta-ocimene, which vary significantly after B. dorsalis infestation were selected as template molecules to prepare molecularly imprinted polymers (MIPs) and modify QCM respectively. The experimental results show that the prepared MIPs-QCM sensors had sensitivity in the range of 0.043-0.070 Hz/(mg/m3), and their stability and reproducibility were above 93.1%. Four sensors that contributed to the classification were screened by a stepwise discriminant analysis. Afterward, Bayesian optimization was employed to optimize the hyperparameters. The accuracy of the optimized support vector machine (SVM) reached 94.17%. The olfactory detection system developed in this study enables the discrimi-nation of citrus infested with B. dorsalis, which may have potential applications in the field of post-harvest treatment of citrus.
With the explosive growth of data, the use of big data technology machine learning classification algorithms to predict the results can improve the intelligent classification of data. It can provide data support for p...
详细信息
With the explosive growth of data, the use of big data technology machine learning classification algorithms to predict the results can improve the intelligent classification of data. It can provide data support for predicting classification in advance. Filter out the classification results to improve the efficiency of data processing and data realization. This article first introduces the development process of machine learning under big data, introduces the mainstream distributed processing framework spark, and then compares the advantages and disadvantages of classification algorithms under big data.
The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and out...
详细信息
The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and outcome definition, this research pertains to the generalizability impediments that occur during the development and post-deployment of machine learning classification algorithms. Using the Framingham coronary heart disease data as a case study, we show how to effectively select a probability cutoff to convert a regression model for a dichotomous variable into a classifier. We then compare the sampling distribution of the predictive performance of eight machine learning classification algorithms under four stratified training/testing scenarios to test their generalizability and their potential to perpetuate biases. We show that both extreme gradient boosting and support vector machine are flawed when trained on an unbalanced dataset. We then show that the double discriminant scoring of type 1 and 2 is the most generalizable with respect to the true positive and negative rates, respectively, as it consistently outperforms the other classification algorithms, regardless of the training/testing scenario. Finally, we introduce a methodology to extract an optimal variable hierarchy for a classification algorithm and illustrate it on the overall, male and female Framingham coronary heart disease data.
classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed ...
详细信息
classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.
This paper predicts the Diabetes Disease based on Data Mining Techniques of classification algorithms. classification algorithm and tools may reduce heavy work on Doctors. In this paper Evaluated as classification Alg...
详细信息
ISBN:
(纸本)9781467382878
This paper predicts the Diabetes Disease based on Data Mining Techniques of classification algorithms. classification algorithm and tools may reduce heavy work on Doctors. In this paper Evaluated as classification algorithms for the Classify of some Diabetes Disease Patient Datasets. Data Mining is one of the main algorithm is classification. classification algorithm Examine of the Decision Tree algorithm, Byes algorithm and Rule based algorithm. These algorithms are evaluate Error Rates and identify of the patients based evolution Function of the measure the accurate results.
Background: The performance of a classification algorithm eventually reaches a point of diminishing returns, where the additional sample added does not improve the results. Thus, there is a need to determine an optima...
详细信息
Background: The performance of a classification algorithm eventually reaches a point of diminishing returns, where the additional sample added does not improve the results. Thus, there is a need to determine an optimal sample size that maximizes performance while accounting for computational burden or budgetary concerns. Objective: This study aimed to determine optimal sample sizes and the relationships between sample size and dataset-level characteristics over a variety of binary classification algorithms. Methods: A total of 16 large open-source datasets were collected, each containing a binary clinical outcome. Furthermore, 4 machine learning algorithms were assessed: XGBoost (XGB), random forest (RF), logistic regression (LR), and neural networks (NNs). For each dataset, the cross-validated area under the curve (AUC) was calculated at increasing sample sizes, and learning curves were fit. Sample sizes needed to reach the observed full-dataset AUC minus 2 points (0.02) were calculated from the fitted learning curves and compared across the datasets and algorithms. Dataset-level characteristics, minority class proportion, full-dataset AUC, number of features, type of features, and degree of nonlinearity were examined. Negative binomial regression models were used to quantify relationships between these characteristics and expected sample sizes within each algorithm. A total of 4 multivariable models were constructed, which selected the best-fitting combination of dataset-level characteristics. Results: Among the 16 datasets (full-dataset sample sizes ranging from 70,000-1,000,000), median sample sizes were 9960 (XGB), 3404 (RF), 696 (LR), and 12,298 (NN) to reach AUC stability. For all 4 algorithms, more balanced classes (multiplier: 0.93-0.96 for a 1% increase in minority class proportion) were associated with decreased sample size. Other characteristics varied in importance across algorithms-in general, more features, weaker features, and more complex relationsh
暂无评论