knowledgegraphs provide structure and semantic context to unstructured data. Creating them is labour intensive: it requires a close collaboration of graph developers and domain experts. Therefore, previous work has m...
详细信息
Pre-trained language models have become popular in natural language processing tasks, but their inner workings and knowledge acquisition processes remain unclear. To address this issue, we introduce K-Bloom-a refined ...
详细信息
Pre-trained language models have become popular in natural language processing tasks, but their inner workings and knowledge acquisition processes remain unclear. To address this issue, we introduce K-Bloom-a refined search-and-score mechanism tailored for seed-guided exploration in pre-trained language models, ensuring both accuracy and efficiency in extracting relevant entity pairs and relationships. Specifically, our crawling procedure is divided into two sub-tasks. Using a few seed entity pairs to minimize the need for extensive manual effort or predefined knowledge, we expand the knowledgegraph with new entity pairs around these seeds. To evaluate the effectiveness of our proposed model, we conducted experiments on two datasets that cover the general domain. Our resulting knowledgegraphs serve as symbolic representations of the source pre-trained language models, providing valuable insights into their knowledge capacities. Additionally, they enhance our understanding of the pre-trained language models' capabilities when automatically evaluated on large language models. The experimental results demonstrate that our method outperforms the baseline approach by up to 5.62% in terms of accuracy in various settings of the two benchmarks. We believe that our approach offers a scalable and flexible solution for knowledgegraph construction and can be applied to different domains and novel contexts.
The structures of discourse used by legal and ordinary languages share differences that foster technical issues when applying or fine-tuning general-purpose language models for open-domain question answering on legal ...
详细信息
The structures of discourse used by legal and ordinary languages share differences that foster technical issues when applying or fine-tuning general-purpose language models for open-domain question answering on legal resources. For example, longer sentences may be preferred in European laws (i.e., Brussels I bis Regulation EU 1215/2012) to reduce potential ambiguities and improve comprehensibility, distracting a language model trained on ordinary English. In this article, we investigate some mechanisms to isolate and capture the discursive patterns of legalese in order to perform zero-shot question answering, i.e., without training on legal documents. Specifically, we use pre-trained open-domain answer retrieval systems and study what happens when changing the type of information to consider for retrieval. Indeed, by selecting only the important parts of discourse (e.g., elementary units of discourse, EDU for short, or abstract representations of meaning, AMR for short), we should be able to help the answer retriever identify the elements of interest. Hence, with this paper, we publish Q4EU, a new evaluation dataset that includes more than 70 questions and 200 answers on 6 different European norms, and study what happens to a baseline system when only EDUs or AMRs are used during information retrieval. Our results show that the versions using EDUs are overall the best, leading to state-of-the-art F1, precision, NDCG and MRR scores.
Process mining focuses on extracting knowledge, under the form of models, from data generated and stored in information systems. The analysis of generated models can provide useful insights to domain experts. In addit...
详细信息
ISBN:
(纸本)9781450355537
Process mining focuses on extracting knowledge, under the form of models, from data generated and stored in information systems. The analysis of generated models can provide useful insights to domain experts. In addition, models of processes can be used to test if a considered process complies with some given specifications. For these reasons, process mining is gaining significant importance in the healthcare domain, where the complexity and flexibility of processes makes extremely hard to evaluate and assess how patients have been treated. In this paper we describe how pMineR, an R library designed and developed for performing process mining in the medical domain, is currently exploited in Hospitals for supporting domain experts in the analysis of the extracted knowledge models. In its current release, pMineR can encode extracted processes under the form of directed graphs, which are easy to interpret and understand by experts of the domain. It also provides graphical comparison between different processes, allows to model the adherence to a given clinical guidelines and to estimate performance and the workload of the available resources in healthcare.
暂无评论