Natural Language Processing (NLP) has been widely used to surface the intelligence embedded in textual documents that could not be leveraged through traditional data mining techniques. While NLP models can be shifted ...
详细信息
Around 1.3 billion, or more than 15% of the global population, suffer from a significant form of disability, with students and researchers in this category being underrepresented in higher education. This paper examin...
详细信息
One of the significant issues for the students is getting timely feedback on their work so that they can understand their mistakes and improve their knowledge iteratively. There have been significant results in this a...
详细信息
This study focuses on issues related to the cybersecurity of software applications. A review of key literature sources on technological security measures for information systems and web platforms is presented. Some me...
详细信息
Moss spores are present in aerobiological samples, but their low representation, lack of known allergenic properties, and difficult identification have led to their being overlooked by aerobiologists so far. The data ...
详细信息
Aim: The aims of the present study were to explore the relations between the gingival phenotype (GP) and the periodontal health status and find the prevalence of a specific gingival phenotype in a small Bulgarian popu...
详细信息
The article examines contemporary cyber challenges in organizations. A theoretical review of sources on the issues of cyber security competences and the European regulatory framework has been carried out. A developed ...
详细信息
This study contributes to the deep learning literature by investigating the applicability of different DL models to forecasting monthly LEK/EUR exchange rate. To demonstrate the effectiveness for exchange rate forecas...
详细信息
In this paper, we introduce a previously not studied type of Euclidean tree called LED (Leaves of Equal Depth) tree. LED trees can be used, for example, in computational phylogeny, since they are a natural representat...
详细信息
Natural Language Processing (NLP) has been widely used to surface the intelligence embedded in textual documents that could not be leveraged through traditional data mining techniques. While NLP models can be shifted ...
详细信息
ISBN:
(数字)9798350363807
ISBN:
(纸本)9798350363814
Natural Language Processing (NLP) has been widely used to surface the intelligence embedded in textual documents that could not be leveraged through traditional data mining techniques. While NLP models can be shifted with almost no change across domains and languages, training materials such as corpora cannot. Currently, there are no available domain-specific corpora for the training of medical NLP models in the Albanian language. Therefore, in this paper, we introduce the creation of the Medical Dissertations Corpus in Albanian (MeDA1), a 6M words corpus in Albanian in the medical domain. The corpus was built from parsing and cleaning more than 300 dissertations published in the last 10 years by the University of Medicine in Tirana. The corpus can be used in the training or continuous training of NLP models used in medical settings - we intend to use the corpus in a Named Entity Recognition (NER) task for Electronic Health Records (EHR) in the Albanian language. In this paper, we describe in detail the process of PDF parsing and text cleaning that we have conducted for the creation of MeDAl.
暂无评论