Automatic Text Summarization (ATS) is a naturallanguageprocessing (NLP) task essential for handling large volumes of information. ATS can be classified into two main types: extractive and abstractive. Extractive sum...
详细信息
ISBN:
(纸本)9783031790317;9783031790324
Automatic Text Summarization (ATS) is a naturallanguageprocessing (NLP) task essential for handling large volumes of information. ATS can be classified into two main types: extractive and abstractive. Extractive summarization selects sentences or phrases directly from the source text(s), while abstractive summarization generates new sentences that try to capture the original meaning of the source text(s). This paper describes our efforts to perform extractive single-document summarization in multilingual contexts. Although various summarization methods, such as PreSumm and HiStruct+, have shown promising results on English corpora like CNN/DM, there is a significant gap in applying these methods to other languages, especially Brazilian Portuguese. Additionally, these summarizers were evaluated with traditional metrics like ROUGE, which has limitations as it primarily measures superficial text overlap. To fill these gaps, we evaluate the effectiveness of these state-of-the-art methods on the CSTNews corpus (with news texts in Brazilian Portuguese) employing ROUGE and the recent BLANC metric, which measures how much the generated summary aids a pre-trained language model (like BERT) in understanding the document. Our contributions include the results and comparison of adapted models, the discussion of the BLANC metric in contrast to ROUGE, and the expansion of resources available to the Portuguese and multilingual NLP community.
This preliminary study explores the potential of Google Artificial Intelligence based naturallanguage (GNL-AI) sentiment analysis (SA) as a novel and complementary tool for comparing literary translations at the sent...
详细信息
ISBN:
(纸本)9783031791635;9783031791642
This preliminary study explores the potential of Google Artificial Intelligence based naturallanguage (GNL-AI) sentiment analysis (SA) as a novel and complementary tool for comparing literary translations at the sentiment level. This study investigates how effectively GNL-AI can assess the emotional fidelity of each translation by analyzing the author's emotional sentiment conveyed in the text. It seeks to move beyond traditional methods often driven by subjectivity or literalness and instead explore a different perspective capable of assessing the emotional fidelity of each translation toward the source text (ST). Ahmed Toufiq's Arabic novel, "Abu Musa's Women Neighbors", is the ST that serves as a case study. This research analyzes the sentiment of two translations: English (TT1) by Roger Allen and French (TT2) by Philippe Vigreux. Analyzing the sentiment of 18 ST paragraphs and their corresponding segments in both translations TT1 and TT2 using GNL-AI revealed promising results. Most segments exhibited consistent sentiment patterns, highlighting the potential of this tool for identifying broad emotional trends in literary translations. Three paragraphs showed deviations in sentiment scores, highlighting challenges in achieving perfect alignment between the source text and the translations. Interestingly, the English translation captures well the Arabic author's intended emotional impact, showcasing the potential of GNL-AI SA in evaluating literary translation quality. In contrast, the French translation tends slightly towards neutrality, highlighting both the promise and the limitations of this approach. Future research will involve expanding the dataset and exploring how GNL-AI can be integrated with human expertise for a more comprehensive analysis.
Metaphor detection, a critical task in naturallanguageprocessing, involves identifying whether a particular word in a sentence is used metaphorically. Traditional approaches often rely on supervised learning models ...
详细信息
In cloud environments, labels are often defined by cloud architects to categorise and describe their resources, such as virtual machines, storage and network components. These labels play a crucial role in organising,...
详细信息
Since GPT-3.5’s release, large language models (LLMs) have made significant advancements, including in financial analysis. However, their effectiveness in financial calculations and predictions is still uncertain. Th...
详细信息
In the medical field, unstructured medical text holds rich medical knowledge. Identifying medical entities in this text accurately is crucial for structured medical databases, knowledge graphs, and intelligent diagnos...
详细信息
ISBN:
(纸本)9789819794300;9789819794317
In the medical field, unstructured medical text holds rich medical knowledge. Identifying medical entities in this text accurately is crucial for structured medical databases, knowledge graphs, and intelligent diagnostic systems. Medical text has unique features, making it hard for traditional NER methods to identify complex medical entities. In particular, the recognition of nested entities within medical text poses a significant challenge, as it requires systems to recognize and understand the complex hierarchical relationships between entities, placing higher demands on traditional entity recognition systems. To overcome the challenges of nested entity recognition in medical text, we propose a method that combines semantic knowledge enhancement and global pointer optimization. Initially, we incorporate semantic prior knowledge of entity categories, capturing the interplay between labels and text by integrating label relationships. This allows us to obtain candidate entity information enriched with integrated label details. Following this, we establish a classification module to evaluate and score these candidate entities along with their labels, enabling entity prediction. To address nested entities, we introduce a Efficient GlobalPointer module that computes the likelihood of each text span being a specific entity type, thus bolstering nested entity recognition. By merging the outputs from both modules, we arrive at the final predicted entities. Experimental results indicate that our method excels on two flat entity datasets, CMedQANER and CCKS2017, as well as on the nested entity dataset CMeEE. Compared to baseline models, our approach demonstrates notable performance enhancements.
As the digital transformation of education continues to advance, the inefficiency and subjectivity of traditional manual scoring methods have become increasingly prominent. To address this issue, this study developed ...
详细信息
Legal and juridical documents such as rulings, laws, agreements, and contracts contain domain-specific terms and jargon, long and complex sentences that may be difficult to understand for laypeople without domain expe...
详细信息
ISBN:
(纸本)9783031790379;9783031790386
Legal and juridical documents such as rulings, laws, agreements, and contracts contain domain-specific terms and jargon, long and complex sentences that may be difficult to understand for laypeople without domain expertise, reading issues, or with a low education level. The simplification of these documents has been a concern for several years, aiming to democratize access to justice. Courts are already adopting simpler language, especially in documents aimed at laypeople, such as warrants and notifications, to enhance inclusion and clarity. Automatic textual simplification, a subfield of naturallanguageprocessing, seeks to make complex texts more accessible. This paper explores the task of automatic text simplification in Portuguese for the legal domain. The main challenge here is the lack of datasets containing complex sentences and their simplified versions. This work investigates how existing datasets, methods, and metrics used for text simplification perform applied to legal texts in Portuguese. We present qualitative and quantitative analyses using five models. The results show that GPT-based models have the best results, but fine-tuning with domain data is a viable open-source alternative.
This study presents an innovative approach to evaluating media representations of gender-based violence by integrating naturallanguageprocessing (NLP) techniques with the advanced capabilities of GPT-4, an Artificia...
详细信息
This study presents an innovative approach to evaluating media representations of gender-based violence by integrating naturallanguageprocessing (NLP) techniques with the advanced capabilities of GPT-4, an Artificial Intelligence (AI)based large language model. We developed a set of 27 expert-defined criteria to analyze a corpus of news articles, initially utilizing NLP methods for foundational text analysis. For more complex criteria, we employed GPT-4 and further enhanced its precision with fine-tuning. Our results indicate a significant increase in accuracy, achieving an overall 76% accuracy rate in content evaluation, which is 9% points higher than using NLP alone. This research introduces a novel media content analysis framework and paves the way for future enhancements in automated journalism assessment and ethical reporting.
Event Extraction is an important task in naturallanguage understanding, which aims to identify event trigger of pre-defined event types and their arguments of specific roles, has attracted a lot of attention from ind...
详细信息
暂无评论