this study addresses battery failure in motorized wheel chairs, which are essential for the mobility of individuals with disabilities. the main objective was to concept a comprehensive dataset comprising six attribute...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
this study addresses battery failure in motorized wheel chairs, which are essential for the mobility of individuals with disabilities. the main objective was to concept a comprehensive dataset comprising six attributes that directly impact battery life, consisting of 498 instances. Using the Random Forest algorithm, we demonstrate the ability to accurately predict battery failures. the results highlight the necessity for proactive measures to prevent battery degradation and extend its lifespan.
Universitat Polit`ecnica de Val`encia (UPV) faces challenges in managing its Alfresco document repository, which contains 600,000 PDF files, of which only 100,000 are correctly categorised. Manual classification is la...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Universitat Polit`ecnica de Val`encia (UPV) faces challenges in managing its Alfresco document repository, which contains 600,000 PDF files, of which only 100,000 are correctly categorised. Manual classification is laborious and error-prone, hindering information retrieval and advanced search capabilities. this project presents an automated pipeline that integrates optical character recognition (OCR) and machine learning to efficiently classify documents. Our approach distinguishes between scanned and digital documents, accurately extracts text and categorises it into 51 predefined categories using models such as BERT and RF. By improving document organisation and accessibility, this work optimises UPV's document management and paves the way for advanced search technologies and real-time classification systems.
SMS Spam Detection has increasingly garnered attention due to the widespread use of mobile devices. Currently, most SMS spam detection model training methods rely on centralized data collection, which poses numerous p...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
SMS Spam Detection has increasingly garnered attention due to the widespread use of mobile devices. Currently, most SMS spam detection model training methods rely on centralized data collection, which poses numerous privacy threats and creates security vulnerabilities that expose sensitive information. this study aims to propose a training method that does not require data sharing between parties, based on a federated learning system. In this paper, we experiment with FedAvg, FedAvgM, and FedAdam algorithms using a fine-tuned PhoBERT model tailored for the SMS spam classification task. the results show that the FedAvg algorithm achieves high performance with an accuracy of 99.38% in the IID setting, while the FedAdam algorithm proves more effective in the Non-IID setting, yielding a model with an accuracy of up to 98.5%. this study demonstrates that models like PhoBERT trained with FL algorithms can achieve classification capabilities comparable to centralized data training methods, highlighting the significant potential of FL for natural language processing models without the need for centralized data collection.
this paper presents a novel deep-learning pipeline to segment large railway datasets with minimal manual annotation, notoriously time consuming. the pipeline adapts DINOv2 [11] for labeling point clouds, with tailored...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
this paper presents a novel deep-learning pipeline to segment large railway datasets with minimal manual annotation, notoriously time consuming. the pipeline adapts DINOv2 [11] for labeling point clouds, with tailored self-distillation pre-training and fine-tuning. the adopted transformer architecture successfully generalizes to multiple railway datasets, with a lightweight pipeline that outperforms manual labeling speed by a factor of 6, despite requiring a final segmentation check and correction. this groundbreaking achievement bridges the gap between the need for annotated point clouds in railway industry and the lack of publicly available annotated datasets.
this paper presents an in-depth analysis of data from the Alpha Ventus offshore wind farm, emphasizing the identification and detection of anomalies in wind turbine performance. Utilizing real-world data from the RAVE...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
this paper presents an in-depth analysis of data from the Alpha Ventus offshore wind farm, emphasizing the identification and detection of anomalies in wind turbine performance. Utilizing real-world data from the RAVE (Research at Alpha Ventus) project, we explore the complexities of offshore wind energy generation, including the effects of wind speed, nacelle position, and environmental factors on turbine behaviour. In this paper, among the various machine learning techniques, we have selected k-nearest neighbours (k-NN), to identify patterns and detect anomalies indicative of potential issues. Our findings demonstrate that some turbines of the wind farm, centrally located, are subject to significant wake effects and operational irregularities. By adjusting the parameters of the k-NN model, we achieved an anomaly detection framework, enhancing the reliability of turbine operation and maintenance.
Machine learning models often excel in controlled environments but may struggle with noisy, incomplete, or shifted real-world data. Ensuring that these models maintain high performance despite these imperfections is c...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
Machine learning models often excel in controlled environments but may struggle with noisy, incomplete, or shifted real-world data. Ensuring that these models maintain high performance despite these imperfections is crucial for practical applications, such as medical diagnosis or autonomous driving. this paper introduces a novel framework to systematically analyse the robustness of Machine learning models against noisy data. We propose two empirical methods: (1) Noise Tolerance Estimation, which calculates the noise level a model can withstand without significant degradation in performance, and (2) Robustness Ranking, which ranks Machine learning models by their robustness at specific noise levels. Utilizing Cohen's kappa statistic, we measure the consistency between a model's predictions on original and perturbed datasets. Our methods are demonstrated using various datasets and Machine learning techniques, identifying models that maintain reliability under noisy conditions.
this paper describes the creation of a database and a machine learning model to predict employee attrition. Our proposal deals with attrition by considering 3 classes (voluntary, involuntary and no attritors) giving a...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
this paper describes the creation of a database and a machine learning model to predict employee attrition. Our proposal deals with attrition by considering 3 classes (voluntary, involuntary and no attritors) giving a more complete view of the loss of qualified personnel to the Human Resources Management. Of the several machine learning models tested to solve the problem, XGBoost stood out as the best performing one on a dataset with more than four thousand employees and twenty-one features collected from three independent companies from different industrial sectors. the model, evaluated on a 20-run experiment, achieved an overall mean accuracy of 78.5%, corresponding to the correct classification of 52.6% of the voluntary attritors, 78.9% of the involuntary attritors and 81.6% of the non-attritors, showing that voluntary attritors are harder to discriminate.
Missing data is a prevalent problem in data science for many fields such as natural, social, and health sciences. Since most regression methods can not handle missing data directly, imputation methods are used in data...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Missing data is a prevalent problem in data science for many fields such as natural, social, and health sciences. Since most regression methods can not handle missing data directly, imputation methods are used in data pre-processing. Finding the best imputation method is non-trivial, however. Moreover, our results show that an independent choice for a best imputation method does not always result in the best predictive performance in the end;the combination matters. Furthermore, search-based approaches for finding a best-fitting imputer/regressor-pair can be computationally intensive. In this paper, we propose the MetaLIRS (Meta learning Imputation and Regression Selection) framework for developing resource-friendly ML-based recommendation models for method selection. With MetaLIRS, we constructed a proof-of-concept recommendation model based on 12 meta-features that achieves an accuracy of 63% for selecting the best-fitting imputer/regressor-pair. A data scientist can use this model for a quick resource-friendly recommendation on which imputation and regression method to use for their particular data set and task without the need for an expensive grid search among methods.
Smart schooling seeks to enhance the educational experience through technology. In this effort, a digital educational platform has been developed and empirically tested to identify students at risk of academic failure...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
Smart schooling seeks to enhance the educational experience through technology. In this effort, a digital educational platform has been developed and empirically tested to identify students at risk of academic failure and dropout, while also promoting effective study and learning habits. Machine learning algorithms are employed to assess academic failure risk based on students' responses to a questionnaire created by educational psychologists. the platform predicts students' academic performance by analyzing factors related to their school and home environments, as well as their motivation to learn. Additionally, a decision support system is integrated to alert the class director and recommend preventive actions when risks or unusual behaviors are identified. A decision support system is incorporated to alert the class director and suggest preventive measures after risk or unusual behaviour is detected.
the internationalconference on intelligentdataengineering and automatedlearning(IDEAL)is an annual internationalconference dedicated to emerging and challenging topics in intelligentdata analysis,data mining and...
详细信息
the internationalconference on intelligentdataengineering and automatedlearning(IDEAL)is an annual internationalconference dedicated to emerging and challenging topics in intelligentdata analysis,data mining and their associated learning systems and *** core themes include:the Big data challenges,Machine learning,data Mining,Information Retrieval and Management,Bio-and Neuro-Informatics,Bio-Inspired Models(including Neural
暂无评论