this study of dialog summarization covers multi-domain, multi-modal and multilingual datasets, and the potential challenges in the different domains. the scope and progress of this rapidly evolving topic rely on the a...
详细信息
ISBN:
(纸本)9798400709227
this study of dialog summarization covers multi-domain, multi-modal and multilingual datasets, and the potential challenges in the different domains. the scope and progress of this rapidly evolving topic rely on the availability of datasets and emerging domains. Such a study can facilitate the cross-application of datasets to different domains to refine models and also aid scenarios where there is a lack of data in privacy-sensitive settings. Further, our work can enable the cross-fertilization of ideas across domains and in different contexts. Our study encompasses current and emerging domains, a comprehensive compilation of datasets, and avenues for further research.
Named Entity Recognition (NER) is a fundamental task in naturallanguageprocessing (NLP) focused on identifying entities like individuals, organizations, and locations within text. Locating these entities can present...
详细信息
ISBN:
(纸本)9798400709227
Named Entity Recognition (NER) is a fundamental task in naturallanguageprocessing (NLP) focused on identifying entities like individuals, organizations, and locations within text. Locating these entities can present initial challenges, and subsequent classification can be equally daunting. this complexity is exemplified in Setswana, where shared naming between locations and personal names adds an extra layer of intricacy. this study introduces a Setswana NER approach, featuring a Setswana Regex Annotator (SERxA) for preliminary entity classification, followed by BRAT tool annotation. Employing the Conditional Random Fields (CRF) algorithm, we establish a supervised statistical machinelearning NER model for Setswana. Evaluation using standard metrics on a held-out test set attains impressive F1-scores of 0.94 for person entities and 0.79 for location entities. Our findings underscore the viability of NER in Setswana and emphasize the necessity of nurturing NLP resources for less-resourced languages.
Concept is the basic of knowledge. A concept consists of a connotation and an extension. the paper comes up with a concept of Concept-In-Corpus which is a special kind of formal concept, and presents a discovering alg...
详细信息
ISBN:
(纸本)9781424420957
Concept is the basic of knowledge. A concept consists of a connotation and an extension. the paper comes up with a concept of Concept-In-Corpus which is a special kind of formal concept, and presents a discovering algorithm called FCWFT (Filtering Concept-word Based on Feature-tree) which automatically mine the connotation and the extension for a Chinese Concept-In-Corpus from corpus in Chinese. Our work is the first one attempting to mine formal Concepts from free texts in the area of naturallanguageprocessing. We test the algorithm with a large scale corpus. the result is encouraging.
Patterns extracted from dependency parses of sentences are a major source of knowledge for most state-of-the-art relation extraction systems, but can be of low quality in distantly supervised settings. We present a li...
详细信息
ISBN:
(纸本)9781941643990
Patterns extracted from dependency parses of sentences are a major source of knowledge for most state-of-the-art relation extraction systems, but can be of low quality in distantly supervised settings. We present a linguistic annotation tool that allows human experts to analyze and categorize automatically learned patterns, and to identify common error classes. the annotations can be used to create datasets that enable machinelearning approaches to pattern quality estimation. We also present an experimental pattern error analysis for three semantic relations, where we find that between 24% and 61% of the learned dependency patterns are defective due to preprocessing or parsing errors, or due to violations of the distant supervision assumption.
the use of Code-mixed languages in social media platforms is a common phenomenon, which poses a new set of challenges for understanding Code-mixed languages in the field of naturallanguageprocessing (NLP). the imple...
详细信息
ISBN:
(纸本)9781728175591
the use of Code-mixed languages in social media platforms is a common phenomenon, which poses a new set of challenges for understanding Code-mixed languages in the field of naturallanguageprocessing (NLP). the implementation of state-of-the-art (SOTA) algorithms is difficult due to the scarcity of available resources. In the work, we focus on one of the primary NLP tasks namely Named Entity Recognition (NER) for English-Hindi Code-mixed language. We propose an improvised Transformer network that learns word and character embedding from scratch and beneficial in processing low resource Code-mixed languages. We use the only available Twitter NER corpus and obtained a slight improvement over SOTA. the proposed transformer network is a general model and in the future can be useful for training low resource NLP tasks from scratch.
Law is one of the knowledge domains that are most reliant on textual material. Nowadays, however, it is very difficult and time-consuming for legal professionals to read, understand, and analyze all the available docu...
详细信息
ISBN:
(纸本)9781665410144
Law is one of the knowledge domains that are most reliant on textual material. Nowadays, however, it is very difficult and time-consuming for legal professionals to read, understand, and analyze all the available documents, due to the vast volume of case law that is published every day. In this age of legal big data, and withthe increased availability of legal text online, many researchers have given more focus to the development of legal intelligent systems and applications. these intelligent systems can provide great services and solve many problems in legal domain. Over the last years, researchers have focused on predicting judicial case outcomes using naturallanguageprocessing (NLP) and machinelearning (ML) methods over case documents. thus, Legal Judgment Prediction (LJP) is the task of automatically predicting the outcome of a court case given only the text of the case. To the best of our knowledge, no prior research withthis intention has been conducted in English for appeal courts in Canada, as of 2021. the NLP application to legal judgments, that our proposed methodology focuses on, is to predict the outcomes of cases by looking only at the description of cases written by the court. Because appeal court decisions are often binary, as in accept or reject, the task is defined as a binary classification problem between 'Allow' and 'Dismiss'. this is the general approach in the literature as well. We employ various classification methods including classical classifiers, Deep learning (DL) models, and compare their performances. Our best results are obtained using DL models with accuracy values reaching 93.46% and F1-scores reaching 0.92, which are on par withthe best results in the literature. through this study, we hope to establish the basis for future research on the legal system of Canada and offer a baseline for future work.
machine transliteration has a number of applications in a variety of naturallanguageprocessing related tasks such as machine translation, information retrieval and question-answering. For automated learning of machi...
详细信息
Advances in machinelearning and neural networks have transformed naturallanguageprocessing (NLP) and computer vision (CV) applications. Recent research efforts have begun to bridge the gap between the two domains. ...
详细信息
ISBN:
(纸本)9798350363029;9798350363012
Advances in machinelearning and neural networks have transformed naturallanguageprocessing (NLP) and computer vision (CV) applications. Recent research efforts have begun to bridge the gap between the two domains. In this work, we propose a semi supervised Multi-Modal Encoder Decoder Network (MMEDN) to capture the relationship between images and textual descriptions, allowing us to generate meaningful descriptions of images and retrieve images from a database using cross-modality search. the semi-supervised training approach, which combines ground truth text descriptions and pseudotext generated by the text decoder within the model, requires far fewer image-text pairs in the training data and can directly add new raw images without manual text labelling for training. this approach is particularly useful for active learning environments, where labels are expensive and hard to obtain. We show that our model performs well with qualitative evaluations. We applied our model for finding images of a person from large databases and generating descriptions of people involved in an event for adding to an automatically generated report. the model was able to retrieve relevant images and generate accurate descriptions, demonstrating its applicability to more practical use cases.
Deep learning (DL) finds application in several prominent fields, including computer vision, naturallanguageprocessing, and bioinformatics. the proliferation of DL-based methods has brought to notice critical issues...
详细信息
ISBN:
(纸本)9798400716348
Deep learning (DL) finds application in several prominent fields, including computer vision, naturallanguageprocessing, and bioinformatics. the proliferation of DL-based methods has brought to notice critical issues about bias (or unfairness) in classification and weak privacy guarantees of the training data. It is crucial to prioritize addressing these issues to prevent the potentially significant negative impact on users. While there has been progress, majority of the works focus on independently resolving fairness and privacy. We propose a tutorial on "Fair and Private Deep learning" - aimed to provide an exhaustive discussion on (i) reasons behind unfair classifications and lack of privacy, (ii) fairness notions in literature and methods to ensure them, (iii) differentially private DL, and (iv) algorithms that address fair and private DL simultaneously. Moreover, in this tutorial, we not just limit our attention to classical, centralized DL models but also to the fairness and privacy challenges in distributed (or federated) DL. the code, presentation and other details are available at https://***/magnetar-iiith/FairPrivateDL.
Causal inference enables us to move beyond merely observing correlations in understanding the actual causal relationships between variables, but how to connect it withmachinelearning model still needs careful and sc...
详细信息
ISBN:
(数字)9798350354973
ISBN:
(纸本)9798350354980
Causal inference enables us to move beyond merely observing correlations in understanding the actual causal relationships between variables, but how to connect it withmachinelearning model still needs careful and scientific study to judge. this paper discusses various methods for estimating causal effects and their application in different scientific implementation. the feasibility of these methods is explored through data of a social research, Early Childhood Longitudinal Study (ECLS), illustrating findings out of traditional and machinelearning procedure, to provide heterogeneous influence on estimating causal inference.
暂无评论