Obstacle detection systems face challenges related to the Catastrophic forgetting problem, where old obstacles may be misclassified when training new unseen obstacles. Re-training a model from scratch for every new ob...
详细信息
Machine learning models often excel in controlled environments but may struggle with noisy, incomplete, or shifted real-world data. Ensuring that these models maintain high performance despite these imperfections is c...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
Machine learning models often excel in controlled environments but may struggle with noisy, incomplete, or shifted real-world data. Ensuring that these models maintain high performance despite these imperfections is crucial for practical applications, such as medical diagnosis or autonomous driving. this paper introduces a novel framework to systematically analyse the robustness of Machine learning models against noisy data. We propose two empirical methods: (1) Noise Tolerance Estimation, which calculates the noise level a model can withstand without significant degradation in performance, and (2) Robustness Ranking, which ranks Machine learning models by their robustness at specific noise levels. Utilizing Cohen's kappa statistic, we measure the consistency between a model's predictions on original and perturbed datasets. Our methods are demonstrated using various datasets and Machine learning techniques, identifying models that maintain reliability under noisy conditions.
this paper presents an in-depth analysis of data from the Alpha Ventus offshore wind farm, emphasizing the identification and detection of anomalies in wind turbine performance. Utilizing real-world data from the RAVE...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
this paper presents an in-depth analysis of data from the Alpha Ventus offshore wind farm, emphasizing the identification and detection of anomalies in wind turbine performance. Utilizing real-world data from the RAVE (Research at Alpha Ventus) project, we explore the complexities of offshore wind energy generation, including the effects of wind speed, nacelle position, and environmental factors on turbine behaviour. In this paper, among the various machine learning techniques, we have selected k-nearest neighbours (k-NN), to identify patterns and detect anomalies indicative of potential issues. Our findings demonstrate that some turbines of the wind farm, centrally located, are subject to significant wake effects and operational irregularities. By adjusting the parameters of the k-NN model, we achieved an anomaly detection framework, enhancing the reliability of turbine operation and maintenance.
Missing data is a prevalent problem in data science for many fields such as natural, social, and health sciences. Since most regression methods can not handle missing data directly, imputation methods are used in data...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Missing data is a prevalent problem in data science for many fields such as natural, social, and health sciences. Since most regression methods can not handle missing data directly, imputation methods are used in data pre-processing. Finding the best imputation method is non-trivial, however. Moreover, our results show that an independent choice for a best imputation method does not always result in the best predictive performance in the end;the combination matters. Furthermore, search-based approaches for finding a best-fitting imputer/regressor-pair can be computationally intensive. In this paper, we propose the MetaLIRS (Meta learning Imputation and Regression Selection) framework for developing resource-friendly ML-based recommendation models for method selection. With MetaLIRS, we constructed a proof-of-concept recommendation model based on 12 meta-features that achieves an accuracy of 63% for selecting the best-fitting imputer/regressor-pair. A data scientist can use this model for a quick resource-friendly recommendation on which imputation and regression method to use for their particular data set and task without the need for an expensive grid search among methods.
this paper describes the creation of a database and a machine learning model to predict employee attrition. Our proposal deals with attrition by considering 3 classes (voluntary, involuntary and no attritors) giving a...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
this paper describes the creation of a database and a machine learning model to predict employee attrition. Our proposal deals with attrition by considering 3 classes (voluntary, involuntary and no attritors) giving a more complete view of the loss of qualified personnel to the Human Resources Management. Of the several machine learning models tested to solve the problem, XGBoost stood out as the best performing one on a dataset with more than four thousand employees and twenty-one features collected from three independent companies from different industrial sectors. the model, evaluated on a 20-run experiment, achieved an overall mean accuracy of 78.5%, corresponding to the correct classification of 52.6% of the voluntary attritors, 78.9% of the involuntary attritors and 81.6% of the non-attritors, showing that voluntary attritors are harder to discriminate.
Smart schooling seeks to enhance the educational experience through technology. In this effort, a digital educational platform has been developed and empirically tested to identify students at risk of academic failure...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
Smart schooling seeks to enhance the educational experience through technology. In this effort, a digital educational platform has been developed and empirically tested to identify students at risk of academic failure and dropout, while also promoting effective study and learning habits. Machine learning algorithms are employed to assess academic failure risk based on students' responses to a questionnaire created by educational psychologists. the platform predicts students' academic performance by analyzing factors related to their school and home environments, as well as their motivation to learn. Additionally, a decision support system is integrated to alert the class director and recommend preventive actions when risks or unusual behaviors are identified. A decision support system is incorporated to alert the class director and suggest preventive measures after risk or unusual behaviour is detected.
In time series analysis, data aggregation is an essential preprocessing step that consolidates data points over specified time intervals, simplifying the data structure and reducing noise. this process is vital for en...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
In time series analysis, data aggregation is an essential preprocessing step that consolidates data points over specified time intervals, simplifying the data structure and reducing noise. this process is vital for enhancing the manageability of large, complex datasets, commonly encountered in various domains such as energy consumption forecasting. However, the choice of the most appropriate temporal aggregation (TA) frequency can significantly impact the performance and the reliability of predictive models. Different TA frequencies may perform better in different scenarios, and there is a lack of evidence on what indicators may determine the superiority of each one. In this study, we design and execute an empirical experiment framework to first explore the performance of various machine learning (ML) models using different TA frequencies on 04 multivariate time series datasets of ship fuel consumption. We further investigate the reliability of those models by studying the marginal contribution of the time series features in each aggregation period. Using the Random Forest Regressor, XGBoost Regressor, LightGBM, and Extra Trees predictive models along withthe Shapley Additive Explanations to analyze feature contributions, we found that time series aggregation consistently produces accurate results regardless of the TA period. However, the models may exhibit significant biases, highlighting the need to examine their inner workings to select the appropriate TA period.
Federated learning has emerged as a promising approach to train machine learning models on decentralized data sources while preserving data privacy. this paper proposes a new federated approach for Naive Bayes (NB) cl...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
Federated learning has emerged as a promising approach to train machine learning models on decentralized data sources while preserving data privacy. this paper proposes a new federated approach for Naive Bayes (NB) classification, assuming discrete variables. Our approach federates a discriminative variant of NB, sharing meaningless parameters instead of conditional probability tables. therefore, this process is more reliable against possible attacks. We conduct extensive experiments on 12 datasets to validate the efficacy of our approach, comparing federated and non-federated settings. Additionally, we benchmark our method against the generative variant of NB, which serves as a baseline for comparison. Our experimental results demonstrate the effectiveness of our method in achieving accurate classification.
Drowsy drivers contribute to high rates of road accidents, leading to numerous fatalities and injuries. the dangers of drowsiness extend beyond the roads, affecting workplaces that require continuous attention and foc...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Drowsy drivers contribute to high rates of road accidents, leading to numerous fatalities and injuries. the dangers of drowsiness extend beyond the roads, affecting workplaces that require continuous attention and focus. this work aims to develop a non-intrusive, real-time system capable of assessing the user's drowsiness level by processing physiological signals collected by smartwatch sensors. the study utilized photoplethysmography (PPG) and heart rate data from a smartwatch, labeled with an adapted Karolinska Sleepiness Scale (KSS). Deep learning techniques were used to fine-tune hyperparameters and train a 1-dimensional Convolutional Neural Network to reach better prediction performance on drowsiness data. Besides, a 5-fold cross-validation technique was used to evaluate performance and create an ensemble, enhancing prediction robustness through election method. the results show significant prediction accuracy using the ensemble on test data (91.41%). the final pre-trained model was integrated into the smartwatch, creating a real-time drowsiness detection system. this system alerts the user through sound and vibration feedback, preventing them from falling asleep and reducing the risk of accidents.
In today's society, the amount of information we need to process daily from sources such as news, videos, and literature is relatively high. the primary strategy to decrease the workload is to use effective summar...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
In today's society, the amount of information we need to process daily from sources such as news, videos, and literature is relatively high. the primary strategy to decrease the workload is to use effective summarization techniques, either through extractive (where the summary is made up of extracts from the source itself) or abstractive methods. Traditional summarization models often rely on extensive humanannotated data, which is usually quite costly. this research proposes an approach leveraging transformer models to optimize and affordably augment small datasets, enhancing the performance of summarization models. Using sentence clustering and pre-trained models on tasks such as summarization or paraphrasing, we explore whether such an approach can yield better results across various summarization datasets that target different formats, such as video conference transcripts and news articles.
暂无评论