data difficulty level measurement is a critical aspect of machinelearning performance evaluation. Several measures have been used to assess the difficulty level of classifying data points in binary classification. Ho...
详细信息
ISBN:
(纸本)9798350351194;9798350351187
data difficulty level measurement is a critical aspect of machinelearning performance evaluation. Several measures have been used to assess the difficulty level of classifying data points in binary classification. However, these measures typically involve building a machinelearning model first, which is then used to assess the data difficulty level. In this paper, we propose a novel model agnostic measure named as polarized K-entropy to evaluate the difficulty of classifying a data instance. Our measure leverages the computation of entropy based on the nearest neighbors of a data point. We conducted experiments to evaluate the effectiveness of our proposed method by analyzing how the accuracy of machinelearning models change with respect to data difficulty. We used Spearman's rank correlation coefficient to analyze this relationship for neural network, support vector machine, and random forest. Our results show that our measure outperformed the non-conformity measure in all the experiments conducted for six datasets using the selected machinelearning models.
Heart disease, also called cardiovascular disease, is considered one of the deadliest diseases that cause high mortality worldwide. Early detection or prediction is a challenging task in the medical field. There is a ...
详细信息
Heart disease, also called cardiovascular disease, is considered one of the deadliest diseases that cause high mortality worldwide. Early detection or prediction is a challenging task in the medical field. There is a massive amount of data in the healthcare industry, and processing this amount of data is a tedious task. A computer-aided system that predicts cardiac disease can save time and money. Researchers have researched several computer-assisted diagnoses for disease prediction and prognosis. In this paper, the authors provide an extensive literature survey of various classification approaches such as machinelearning, Feature Selection, Hybrid, Ensemble, and Deep learning used by researchers in the last decade for Heart Disease prediction. Furthermore, as the paper focuses on machinelearning techniques, comparative analysis of the performance and accuracy of various machinelearning techniques are summarized in tabular form. Additionally, this work critically assesses earlier methods and outlines their shortcomings. Finally, the article offers some potential future research direction in machinelearning-based automated heart disease prediction.
Physicians rely on various data sources when diagnosing Tuberculosis (TB). This includes the patient's historical data, demographic data, clinical laboratory results, and imaging data. Traditionally, the applicati...
详细信息
Physicians rely on various data sources when diagnosing Tuberculosis (TB). This includes the patient's historical data, demographic data, clinical laboratory results, and imaging data. Traditionally, the application of machinelearning and deep learning in detecting TB has focused more on using single modes of data. This constrains the capabilities of the artificial intelligence (AI) techniques to replicate the clinical practice of incorporating multiple sources of information in decision-making. Recent advancements in deep learning and machinelearning have enabled the integration of multimodal data which has led to the development of applications that more accurately reflect the clinician's approach. However, the operations of deep learning techniques are still blackbox in nature, which makes it hard to understand their internal work mechanisms. As a result, it is necessary to incorporate explainable AI techniques to assist AI model users understand how the models make decisions. In this paper, we carried out a systematic review of two areas: First, we reviewed recent studies on the application of multimodal learning in TB detection. Here we have provided a summary of the public datasets used in the studies, data modalities used, the fusion techniques, and finally identified AI techniques that can be used with multimodal data. Then we looked at papers that used explainable AI techniques in TB diagnosis and prognosis. This study followed PRISMA guidelines to ensure replicability and accurate reporting of the main findings of the reviewed studies. To stay up-to-date with the state of the art, we specifically examined papers published between 2019 and June 2024. We reviewed thirty-one journal and conference papers we found using Web of science, Scopus and Pubmed databases. The review indicated that models trained on multiple data modalities outperformed those trained on single data modalities. This is due to the additional information extracted from each data modalit
Metal cutting is an important process in industrial manufacturing. Using the mechanical quantities of metal cutting to optimize process design is helpful to improve productivity. However, it is expensive to obtain the...
详细信息
Metal cutting is an important process in industrial manufacturing. Using the mechanical quantities of metal cutting to optimize process design is helpful to improve productivity. However, it is expensive to obtain these quantities due to the complexity of the cutting process, including material nonlinearity, geometric nonlinearity, state nonlinearity and their interactions. In this paper, a prediction model is constructed by combining machinelearning (ML) and simulation data to quickly acquire multi-difficult-to-obtain metal cutting mechanical quantities to solve this problem. First, Adaptive Smoothed Particle Hydrodynamics (ASPH) is used to generate a simulation dataset of 2000 metal cutting cases. Based on the simulation data, six machinelearning (ML) methods are employed to establish two prediction models, single-task learning and multi-task learning, to predict the mechanical quantities of metal cutting. The experimental results demonstrate that the ML method can predict abundant reference data efficiently after understanding the relationship between simulation parameters and mechanical quantities from simulation data, which is expected to replace some similar and repetitive simulation work. The Multilayer Perceptron (MLP) model under the multi-task setting provides the best prediction performance, fastest prediction time efficiency, and stable model behavior. Additionally, input erasure experiments reveal that the prediction of maximum equivalent plastic strain is significantly affected by particle spacing, and cutting speed plays a vital role in predicting maximum velocity. This work highlights the promotion of the data-driven ML method in quickly obtaining abundant reference data for the metal cutting process, and provides an auxiliary means for process optimization.
To build an intelligent intrusion detection system, it is essential to have a suitable and high-quality dataset with a sufficiently large quantity to simulate real-world scenarios. The NSL-KDD dataset is an improved v...
详细信息
With the development of space technology, wide-field sky surveys using telescopes have expanded the range of new data available for time-domain astronomical research. Traditional data analysis methods can no longer re...
详细信息
It is commonly assumed that digital learning environments such as intelligent tutoring systems facilitate learning and positively impact achievement. This study explores how different groups of students exhibit distin...
详细信息
ISBN:
(纸本)9798400707018
It is commonly assumed that digital learning environments such as intelligent tutoring systems facilitate learning and positively impact achievement. This study explores how different groups of students exhibit distinct relationships between learning behaviors and academic achievement in an intelligent tutoring system for English as a foreign language. We examined whether these differences are linked to students' prior knowledge, personality traits, and motivation. We collected behavioral trace data from 507 German seventh-grade students during the 2021/22 school year and appliedmachinelearning models to predict English performance based on learning behaviors ( best-performing model's R-2 =.41). To understand the impact of specific behaviors, we applied the explainable AI method SHAP and identified three student clusters with distinct learning behavior patterns. Subsequent analyses revealed that these clusters also varied in prior knowledge and motivation: one with high prior knowledge and average motivation, another with low prior knowledge and average motivation, and a third with both low prior knowledge and low motivation. Our findings suggest that learning behaviors are linked differently to academic success across students and are closely tied to their prior knowledge and motivation. This hints towards the importance of personalizing learning systems to support individual learning needs better.
This article reviews the theory of fairness in AI-frommachinelearning to federated learning,where the constraints on precision AI fairness and perspective solutions are also *** a reliable and quantitative evaluation...
详细信息
This article reviews the theory of fairness in AI-frommachinelearning to federated learning,where the constraints on precision AI fairness and perspective solutions are also *** a reliable and quantitative evaluation of AI fairness,many associated concepts have been proposed,formulated and ***,the inexplicability of machinelearning systems makes it almost impossible to include all necessary details in the modelling stage to ensure *** privacy worries induce the data unfairness and hence,the biases in the datasets for evaluating AI fairness are *** imbalance between algorithms’utility and humanization has further reinforced *** for federated learning systems,these constraints on precision AI fairness still *** solution is to reconcile the federated learning processes and reduce biases and imbalances accordingly.
This study analyzes the perception of Uber users through Twitter, currently known as X, using the CRISP-DIM methodology in Python. We collected data from the last twelve years to accomplish this study. The data set is...
详细信息
ISBN:
(纸本)9798350361513;9798350372304
This study analyzes the perception of Uber users through Twitter, currently known as X, using the CRISP-DIM methodology in Python. We collected data from the last twelve years to accomplish this study. The data set is divided into training and testing, processing them using natural language processing and classifying them as neutral, positive, and hostile. Classification algorithms such as Logistic Regression, Support Vector machines (SVM), and Naive Bayes are applied, with SVM being the most effective in predicting user sentiments. This approach leverages Twitter accessibility and data analytics to understand the public perception of Uber.
The increasing use of machinelearning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and impro...
详细信息
ISBN:
(纸本)9798400707018
The increasing use of machinelearning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.
暂无评论