the proceedings contain 116 papers. the topics discussed include: support function machines;group decision making with triangular fuzzy linguistic variables;advanced forecasting and classification technique for condit...
详细信息
ISBN:
(纸本)9783540772255
the proceedings contain 116 papers. the topics discussed include: support function machines;group decision making with triangular fuzzy linguistic variables;advanced forecasting and classification technique for condition monitoring of rotating machinery;a new recurring multistage evolutionary algorithm for solving problems efficiently;exploration of a text collection and identification of topics by clustering;fuzzy ridge regression with non symmetric membership functions and quadratic models;load forecasting with support vector machines and semi-parametric method;support kernel machine-based active learning to find labels and a proper kernel simultaneously;knowledge extraction from unstructured surface meshes;the outer impartation information content of rules and rule sets;an engineering approach to data mining projects;and a framework to analyze biclustering results on microarray experiments.
作者:
Mitic, PeterUCL
Dept Comp Sci Gower St London WC1E 6BT England
We present a quantitative definition of reputation risk, formulated in terms of a reputation time series comprising daily sentiment measurements. Self Supported learning is used to quantify reputation risk by progress...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
We present a quantitative definition of reputation risk, formulated in terms of a reputation time series comprising daily sentiment measurements. Self Supported learning is used to quantify reputation risk by progressively refining an initial proposal for a Minimum Acceptable Sentiment, calculated from descriptive statistics of the reputation data. the derived values are validated using a "sense test" based on a Loess quantile. the results show that the Minimum Acceptable Sentiment value is given approximately by a two standard deviation lower tail of the observed data.
When delivered to the market, machine learning models face new data which are possibly subject to novel characteristics - a phenomenon known as concept drift. As this might lead to performance degradation, it is neces...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
When delivered to the market, machine learning models face new data which are possibly subject to novel characteristics - a phenomenon known as concept drift. As this might lead to performance degradation, it is necessary to detect such drift and, if required, adapt the model accordingly. While a variety of drift detection and adaptation methods exists for standard vectorial data, a suitable treatment of text data is less researched. In this work we present a novel approach which detects and explains drift in text data based on their representation via transformer embeddings. In a nutshell, the method generates suitable statistical features from the original distribution and the possibly shifted variation. Based on these representations, drift scores can be assigned to individual data points, allowing a visualization and human-readable characterization of the type of drift. We demonstrate the approach's effectiveness in reliably detecting drift in several experiments.
Hyperparameter Optimization (HPO) plays a significant role in enhancing the performance of machine learning models. However, as the size and complexity of (deep) neural architectures continue to increase, conducting H...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Hyperparameter Optimization (HPO) plays a significant role in enhancing the performance of machine learning models. However, as the size and complexity of (deep) neural architectures continue to increase, conducting HPO has become very expensive in terms of time and computational resources. Existing methods that automate this process still demand numerous evaluations to find the optimal hyperparameter configurations. In this paper, we present a novel approach based on model-based reinforcement learning to effectively improve sample efficiency while minimizing resource consumption. We formulate the HPO task as a Markov decision process and develop a predictive dynamics model for efficient policy optimization. Additionally, we employ the Deep Sets framework to encode the state space, which is then leveraged in meta-learning for transfer of knowledge across multiple datasets, enabling the model to quickly adapt to new datasets. Empirical studies demonstrate that our approach outperforms alternative techniques on publicly available datasets in terms of sample efficiency and accuracy.
this study addresses battery failure in motorized wheel chairs, which are essential for the mobility of individuals with disabilities. the main objective was to concept a comprehensive dataset comprising six attribute...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
this study addresses battery failure in motorized wheel chairs, which are essential for the mobility of individuals with disabilities. the main objective was to concept a comprehensive dataset comprising six attributes that directly impact battery life, consisting of 498 instances. Using the Random Forest algorithm, we demonstrate the ability to accurately predict battery failures. the results highlight the necessity for proactive measures to prevent battery degradation and extend its lifespan.
this paper presents a novel deep-learning pipeline to segment large railway datasets with minimal manual annotation, notoriously time consuming. the pipeline adapts DINOv2 [11] for labeling point clouds, with tailored...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
this paper presents a novel deep-learning pipeline to segment large railway datasets with minimal manual annotation, notoriously time consuming. the pipeline adapts DINOv2 [11] for labeling point clouds, with tailored self-distillation pre-training and fine-tuning. the adopted transformer architecture successfully generalizes to multiple railway datasets, with a lightweight pipeline that outperforms manual labeling speed by a factor of 6, despite requiring a final segmentation check and correction. this groundbreaking achievement bridges the gap between the need for annotated point clouds in railway industry and the lack of publicly available annotated datasets.
Universitat Polit`ecnica de Val`encia (UPV) faces challenges in managing its Alfresco document repository, which contains 600,000 PDF files, of which only 100,000 are correctly categorised. Manual classification is la...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Universitat Polit`ecnica de Val`encia (UPV) faces challenges in managing its Alfresco document repository, which contains 600,000 PDF files, of which only 100,000 are correctly categorised. Manual classification is laborious and error-prone, hindering information retrieval and advanced search capabilities. this project presents an automated pipeline that integrates optical character recognition (OCR) and machine learning to efficiently classify documents. Our approach distinguishes between scanned and digital documents, accurately extracts text and categorises it into 51 predefined categories using models such as BERT and RF. By improving document organisation and accessibility, this work optimises UPV's document management and paves the way for advanced search technologies and real-time classification systems.
SMS Spam Detection has increasingly garnered attention due to the widespread use of mobile devices. Currently, most SMS spam detection model training methods rely on centralized data collection, which poses numerous p...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
SMS Spam Detection has increasingly garnered attention due to the widespread use of mobile devices. Currently, most SMS spam detection model training methods rely on centralized data collection, which poses numerous privacy threats and creates security vulnerabilities that expose sensitive information. this study aims to propose a training method that does not require data sharing between parties, based on a federated learning system. In this paper, we experiment with FedAvg, FedAvgM, and FedAdam algorithms using a fine-tuned PhoBERT model tailored for the SMS spam classification task. the results show that the FedAvg algorithm achieves high performance with an accuracy of 99.38% in the IID setting, while the FedAdam algorithm proves more effective in the Non-IID setting, yielding a model with an accuracy of up to 98.5%. this study demonstrates that models like PhoBERT trained with FL algorithms can achieve classification capabilities comparable to centralized data training methods, highlighting the significant potential of FL for natural language processing models without the need for centralized data collection.
Understanding the inherent complexity of temporal data is crucial for effective time series analytics. One dimension of complexity is the level of structural depth at which analysis methods operate. these levels range...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
Understanding the inherent complexity of temporal data is crucial for effective time series analytics. One dimension of complexity is the level of structural depth at which analysis methods operate. these levels range from entire time series collections down to individual sequences of reduced dimensionality and length. Complementary to this type of complexity, is the quantity and expressiveness of knowledge associated with time series data, including labels and other features that provide valuable information. Both, the structural as well as the semantic layer, define the suitability and effectiveness of different analysis methods. In this paper we introduce a conceptual framework to support the automated selection of analytical time series approaches. To this end, we specify a context-free grammar to describe hierarchies and compositions of time series data, while also defining different classes of semantic information, resulting in a data-specific classification of time series analysis methods. Along with a demonstration via concrete examples, we provide a discussion on challenges, opportunities and future work associated withthe proposed approach.
the ability to detect, define, and classify Change of Direction (COD) movements during running plays a crucial role in sports science, as it has been widely used to assess athlete performance. Automating the process o...
详细信息
ISBN:
(纸本)9783031777301;9783031777318
the ability to detect, define, and classify Change of Direction (COD) movements during running plays a crucial role in sports science, as it has been widely used to assess athlete performance. Automating the process of COD classification during live games or training can provide real-time feedback. In this study, we evaluated Machine learning (ML) and Deep learning (DL) models for the classification of COD using accelerometers and gyroscope sensor data, and speed data were calculated from the Global Positioning System (GPS) sensor data. We hypothesized that DL algorithms classify COD better than ML classification algorithms. Comparative analysis showed that the best-performing DL and ML models showed similar behavior. Similarly, the statistical analysis observed no significant difference. this emphasized the importance of accurate model selection.
暂无评论