The proceedings contain 5 papers. The topics discussed include: defining meaningful local process models;removing implicit places using regions for process discovery;detecting infrequent behavior in event logs using s...
The proceedings contain 5 papers. The topics discussed include: defining meaningful local process models;removing implicit places using regions for process discovery;detecting infrequent behavior in event logs using statistical inference;on discovering distributed process models the case of asynchronous communication;and on the complexity of synthesis of nop-free boolean Petri nets.
This proceedings contains 12 papers. The Proceedings of the VLDB Endowment (PVLDB) provides a high-quality publication service to the data management research community. This conference issue focuses on large-scale gr...
This proceedings contains 12 papers. The Proceedings of the VLDB Endowment (PVLDB) provides a high-quality publication service to the data management research community. This conference issue focuses on large-scale graph analytics, from indexing-based efficient algorithms, parallel processing, to applications in community search;state-of-the-art work in using GPU for speedup database operations, in making blockchains more scalable, and in supporting data compression and query processing with columnar storage;dealing with very interesting and non-trivial applications;propose a framework and technical solution that can help users to improve the outcome when their resumes and loan applications are rejected by machine learning models, and another is to detect cherry-picking when data points are chosen to tell stories without good support;data quality management research, including structured entity retrieval and approximate provenance summary;conducts a thorough analysis of OLAP systems, revealing insights for efficient use of hardware resources, etc. The key terms of this proceedings include billion-scale label-constrained reachability, resilient blockchain fabric, data-parallel query processing, non-uniform data, micro-architectural analysis, heterogeneous information networks, OLAP, graph analytics, structured entities, columnar storage.
Recently proposed pre-trained language models can be easily fine-tuned to a wide range of downstream tasks. However, fine-tuning requires a large training set. This PhD project introduces novel natural language proces...
详细信息
ISBN:
(纸本)9781956792003
Recently proposed pre-trained language models can be easily fine-tuned to a wide range of downstream tasks. However, fine-tuning requires a large training set. This PhD project introduces novel natural language processing (NLP) use cases in the healthcare domain where obtaining a large training dataset is difficult and expensive. To this end, we propose data-efficient algorithms to fine-tune NLP models in low-resource settings and validate their effectiveness. We expect the outcomes of this PhD project could contribute to the NLP research and low-resource application domains.
As the fundamental phrase of collecting and analyzing data, data integration is used in many applications, such as data cleaning, bioinformatics and pattern recognition. In big data era, one of the major problems of d...
详细信息
ISBN:
(数字)9789819708017
ISBN:
(纸本)9789819708000;9789819708017
As the fundamental phrase of collecting and analyzing data, data integration is used in many applications, such as data cleaning, bioinformatics and pattern recognition. In big data era, one of the major problems of data integration is to obtain the global schema of data sources since the global schema could be hardly derived from massive data sources directly. In this paper, we attempt to solve such schema integration problem. For different scenarios, we develop batch and incremental schema integration algorithms. We consider the representation difference of attribute names in various data sources and propose ED Join and Semantic Join algorithms to integrate attributes with different representations. Extensive experimental results demonstrate that the proposed algorithms could integrate schemas efficiently and effectively.
Natural language inference (NLI) aims to infer the relationship between two texts: premise and hypothesis. However, many existing methods overlook the problem of overestimation of model performance due to superficial ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Natural language inference (NLI) aims to infer the relationship between two texts: premise and hypothesis. However, many existing methods overlook the problem of overestimation of model performance due to superficial correlation biases in NLI datasets. We study this problem and find that most current models have taken NLI as one of the text-matching tasks, which ignores the asymmetry of the premise and hypothesis of NLI. Therefore, we propose a simple and effective augmentation method, Inversive-Reasoning Augmentation (IRA), to remove the superficial correlation bias. After training the different NLI models with our IRA-augmented data based on two widely-used NLI datasets, we observemore fair evaluation results of the performance and robustness of the various NLI models.
When considering the data-driven identification of non-linear differential equations, the choice of the integration scheme to use is far from being trivial and may dramatically impact the identification problem. In th...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
When considering the data-driven identification of non-linear differential equations, the choice of the integration scheme to use is far from being trivial and may dramatically impact the identification problem. In this work, we discuss this aspect and propose a novel architecture that jointly learns Neural Ordinary Differential Equations (NODEs) as well as the corresponding integration schemes that would minimize the forecast of a given sequence of observations. We demonstrate its relevance with numerical experiments on non-linear dynamics, including chaotic systems.
The proceedings contain 6 papers. The topics discussed include: enhanced predictive modeling of cricket game duration using multiple machine learning algorithms;semantic rule-based automatic code conversion system;coc...
ISBN:
(纸本)9781728189192
The proceedings contain 6 papers. The topics discussed include: enhanced predictive modeling of cricket game duration using multiple machine learning algorithms;semantic rule-based automatic code conversion system;cockpit display graphics symbol detection for software verification using deep learning;BiLSTM-Autoencoder architecture for stance prediction;detection of melanoma from skin lesion images using deep learning techniques;and real-time image processing: face recognition based automated attendance system in-built with ‘two-tier authentication’ method.
Federated learning (FL) is a decentralized learning method used to train machine learning algorithms. In FL, a global model iteratively collects the parameters of local models without accessing their local data. Howev...
详细信息
Learning Analytics (LA) is nowadays ubiquitous in many educational systems, providing the ability to collect and analyze student data in order to understand and optimize learning and the environments in which it occur...
详细信息
ISBN:
(纸本)9798400716188
Learning Analytics (LA) is nowadays ubiquitous in many educational systems, providing the ability to collect and analyze student data in order to understand and optimize learning and the environments in which it occurs. On the other hand, the collection of data requires to comply with the growing demand regarding privacy legislation. In this paper, we use the Student Expectation of Learning Analytics Questionnaire (SELAQ) to analyze the expectations and confidence of students from different faculties regarding the processing of their data for Learning Analytics purposes. This allows us to identify four clusters of students through clustering algorithms: Enthusiasts, Realists, Cautious and Indifferents. This structured analysis provides valuable insights into the acceptance and criticism of Learning Analytics among students.
The proceedings contain 35 papers. The topics discussed include: data-driven approach for generating colormaps of scientific simulation data;a 3D-shockwave volume rendering algorithm based on feature boundary detectio...
ISBN:
(纸本)9789898704214
The proceedings contain 35 papers. The topics discussed include: data-driven approach for generating colormaps of scientific simulation data;a 3D-shockwave volume rendering algorithm based on feature boundary detection;using reorderable matrices to compare risk curves of representative models in oil reservoir development and management activities;laser spot detection and characteristic analysis in plasma interaction simulation;hybrid sort a pattern-focused matrix reordering approach based on classification;data interpolation based on contextual analysis for generating tomographic images in concrete specimen;graphical user interface personalization: user study of image frequency preferences;evaluation of color spaces for unsupervised and deep learning skin lesion segmentation;and repeated pattern extraction with knowledge-based attention and semantic embeddings.
暂无评论