Cardiovascular disease commonly referred as heart disease, encompasses diverse conditions that the heart undergoes which in turn leads to sudden death or prolonged sickness worldwide over the past decades. More recent...
详细信息
The concept of lexically ordered adjacency matrix of a graph is introduced and it is proved that every adjacency matrix is isomorphic to at least one lexically ordered adjacency matrix. An algorithm for the classifica...
详细信息
Credit risk assessment is a crucial element in credit risk management. With the extensive research on consumer credit risk assessment in recent decades, the abundance of literature on this topic can be overwhelming fo...
详细信息
Credit risk assessment is a crucial element in credit risk management. With the extensive research on consumer credit risk assessment in recent decades, the abundance of literature on this topic can be overwhelming for researchers. Therefore, this article aims to provide a more systematic and comprehensive analysis from three perspectives: classification algorithms, data traits, and learning methods. Firstly, the state-of-the-art classification algorithms are categorized into traditional single classifiers, intelligent single classifiers, hybrid and ensemble multiple classifiers. Secondly, considering the diversity of data traits in the credit dataset, data traits are divided into external structure information traits, data quality traits, data quantity traits, and internal information traits. Data traits-driven modeling framework based on multiple classifiers is proposed for solving credit risk assessment. Thirdly, considering the differences in data modeling methods, learning methods are classified into data status, label status, and structure form. Furthermore, model interpretability, model bias, model multi-pattern, and model fairness are discussed. Finally, the limitations and future research directions are presented. This review article serves as a helpful guide for researchers and practitioners in the field of credit risk modeling and analysis.
The characteristic substances in citrus volatile organic compounds (VOCs) are associated with the infestation of the Bactrocera dorsalis (Hendel), which provides a noninvasive evaluation method to discriminate infeste...
详细信息
The characteristic substances in citrus volatile organic compounds (VOCs) are associated with the infestation of the Bactrocera dorsalis (Hendel), which provides a noninvasive evaluation method to discriminate infested citrus. This paper developed an olfactory detection system based on a quartz crystal microbalance (QCM) sensor array and classification algorithms to identify the citrus infested with B. dorsalis. Six characteristic substances, including D-limonene, myrcene, alpha-pinene, decanal, linalool, and beta-ocimene, which vary significantly after B. dorsalis infestation were selected as template molecules to prepare molecularly imprinted polymers (MIPs) and modify QCM respectively. The experimental results show that the prepared MIPs-QCM sensors had sensitivity in the range of 0.043-0.070 Hz/(mg/m3), and their stability and reproducibility were above 93.1%. Four sensors that contributed to the classification were screened by a stepwise discriminant analysis. Afterward, Bayesian optimization was employed to optimize the hyperparameters. The accuracy of the optimized support vector machine (SVM) reached 94.17%. The olfactory detection system developed in this study enables the discrimi-nation of citrus infested with B. dorsalis, which may have potential applications in the field of post-harvest treatment of citrus.
With the explosive growth of data, the use of big data technology machine learning classification algorithms to predict the results can improve the intelligent classification of data. It can provide data support for p...
详细信息
With the explosive growth of data, the use of big data technology machine learning classification algorithms to predict the results can improve the intelligent classification of data. It can provide data support for predicting classification in advance. Filter out the classification results to improve the efficiency of data processing and data realization. This article first introduces the development process of machine learning under big data, introduces the mainstream distributed processing framework spark, and then compares the advantages and disadvantages of classification algorithms under big data.
The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and out...
详细信息
The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and outcome definition, this research pertains to the generalizability impediments that occur during the development and post-deployment of machine learning classification algorithms. Using the Framingham coronary heart disease data as a case study, we show how to effectively select a probability cutoff to convert a regression model for a dichotomous variable into a classifier. We then compare the sampling distribution of the predictive performance of eight machine learning classification algorithms under four stratified training/testing scenarios to test their generalizability and their potential to perpetuate biases. We show that both extreme gradient boosting and support vector machine are flawed when trained on an unbalanced dataset. We then show that the double discriminant scoring of type 1 and 2 is the most generalizable with respect to the true positive and negative rates, respectively, as it consistently outperforms the other classification algorithms, regardless of the training/testing scenario. Finally, we introduce a methodology to extract an optimal variable hierarchy for a classification algorithm and illustrate it on the overall, male and female Framingham coronary heart disease data.
classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed ...
详细信息
classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.
This paper predicts the Diabetes Disease based on Data Mining Techniques of classification algorithms. classification algorithm and tools may reduce heavy work on Doctors. In this paper Evaluated as classification Alg...
详细信息
ISBN:
(纸本)9781467382878
This paper predicts the Diabetes Disease based on Data Mining Techniques of classification algorithms. classification algorithm and tools may reduce heavy work on Doctors. In this paper Evaluated as classification algorithms for the Classify of some Diabetes Disease Patient Datasets. Data Mining is one of the main algorithm is classification. classification algorithm Examine of the Decision Tree algorithm, Byes algorithm and Rule based algorithm. These algorithms are evaluate Error Rates and identify of the patients based evolution Function of the measure the accurate results.
Background: The performance of a classification algorithm eventually reaches a point of diminishing returns, where the additional sample added does not improve the results. Thus, there is a need to determine an optima...
详细信息
Background: The performance of a classification algorithm eventually reaches a point of diminishing returns, where the additional sample added does not improve the results. Thus, there is a need to determine an optimal sample size that maximizes performance while accounting for computational burden or budgetary concerns. Objective: This study aimed to determine optimal sample sizes and the relationships between sample size and dataset-level characteristics over a variety of binary classification algorithms. Methods: A total of 16 large open-source datasets were collected, each containing a binary clinical outcome. Furthermore, 4 machine learning algorithms were assessed: XGBoost (XGB), random forest (RF), logistic regression (LR), and neural networks (NNs). For each dataset, the cross-validated area under the curve (AUC) was calculated at increasing sample sizes, and learning curves were fit. Sample sizes needed to reach the observed full-dataset AUC minus 2 points (0.02) were calculated from the fitted learning curves and compared across the datasets and algorithms. Dataset-level characteristics, minority class proportion, full-dataset AUC, number of features, type of features, and degree of nonlinearity were examined. Negative binomial regression models were used to quantify relationships between these characteristics and expected sample sizes within each algorithm. A total of 4 multivariable models were constructed, which selected the best-fitting combination of dataset-level characteristics. Results: Among the 16 datasets (full-dataset sample sizes ranging from 70,000-1,000,000), median sample sizes were 9960 (XGB), 3404 (RF), 696 (LR), and 12,298 (NN) to reach AUC stability. For all 4 algorithms, more balanced classes (multiplier: 0.93-0.96 for a 1% increase in minority class proportion) were associated with decreased sample size. Other characteristics varied in importance across algorithms-in general, more features, weaker features, and more complex relationsh
Predicting the occurrence of thermoacoustic instabilities is of major interest in a variety of engineering applications such as aircraft propulsion, power generation, and industrial heating. Predictive methodologies b...
详细信息
Predicting the occurrence of thermoacoustic instabilities is of major interest in a variety of engineering applications such as aircraft propulsion, power generation, and industrial heating. Predictive methodologies based on a physical approach have been developed in the past decades, but have a moderate-to-high computational cost when exploring a large number of designs. In this study, the stability prediction capabilities and computational cost of four wellestablished classification algorithms-the K-Nearest Neighbors, Decision Tree (DT), Random Forest (RF), and Multilayer Perceptron (MLP) algorithms-are investigated. These algorithms are trained using an in-house physicsbased low-order network model tool called OSCILOS. All four algorithms are able to predict which configurations are thermoacoustically unstable with a very high accuracy and a very low runtime. Furthermore, the frequency intervals containing unstable modes for a given configuration are also accurately predicted using multilabel classification. The RF algorithm correctly predicts the overall stability and finds all frequency intervals containing unstable modes for 99.6 and 98.3% of all configurations, respectively, which makes it the most accurate algorithm when a large number of training examples is available. For smaller training sets, the MLP algorithm becomes the most accurate algorithm. The DTalgorithm is found to be slightly less accurate, but can be trained extremely quickly and runs about a million times faster than a traditional physics-based low-order network model tool. These findings could be used to devise a new generation of combustor optimization tools that would run much faster than existing codes while retaining a similar accuracy.
暂无评论