As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data con...
详细信息
We develop assistive agents based on Large language Models (LLMs) that aid interlocutors in business ***, we simulate business negotiations by letting two LLM-based agents engage in role play.A third LLM acts as a rem...
详细信息
This paper introduces the Chinese Essay Discourse Coherence Corpus (CEDCC), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap...
详细信息
ISBN:
(纸本)9798891760608
This paper introduces the Chinese Essay Discourse Coherence Corpus (CEDCC), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap which the CEDCC addresses by integrating coherence grading, topical continuity, and discourse relations. This approach, alongside detailed annotations, captures the subtleties of real-world texts and stimulates progress in Chinese discourse coherence analysis. Our contributions include the development of the CEDCC, the establishment of baselines for further research, and the demonstration of the impact of coherence on discourse relation recognition and automated essay scoring. The dataset and related codes is available at https://***/cubenlp/CEDCC_corpus.
The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a naturallanguage question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under...
详细信息
ISBN:
(纸本)9798891760608
The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a naturallanguage question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is split into a series of sub-question generation. Our proposed prompting method KQG-CoT first selects supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we construct a task-specific prompt to guide LLMs to generate complicated questions based on selective logic forms. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.
Large language Models (LLMs) have demonstrated remarkable capability in a variety of NLP tasks. However, LLMs are also prone to generate nonfactual content. Uncertainty Quantification (UQ) is pivotal in enhancing our ...
详细信息
We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this c...
详细信息
ISBN:
(纸本)9798891760608
We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our analysis considers a suite of automatic metrics as well as a large-scale quantitative investigation into the types of common edit operations performed by the different models. Furthermore, we perform a manual qualitative analysis on a subset of model outputs to better gauge the quality of the generated simplifications. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines. Additionally, we find that certain LLMs demonstrate a greater range and diversity of edit operations. Our performance benchmark will be available as a resource for the development of future TS methods and evaluation metrics.(1)
In this paper, we study the problem of generating structured objects that conform to a complex schema, with intricate dependencies between the different components (facets) of the object. The facets of the object (att...
详细信息
In recent years, parameter-efficient fine-tuning methods have gained attention as an alternative to full fine-tuning of pre-trained language models for transfer learning. Such methods can alleviate the problem that fu...
详细信息
ISBN:
(纸本)9789819794362;9789819794379
In recent years, parameter-efficient fine-tuning methods have gained attention as an alternative to full fine-tuning of pre-trained language models for transfer learning. Such methods can alleviate the problem that full fine-tuning requires updating and storing all the model parameters for different tasks, showing comparable performance with much fewer tuned parameters. However, their effectiveness in the context of multilingual naturallanguageprocessing remains underexplored. This paper evaluates the performance of two representative parameter-efficient fine-tuning methods, prefix-tuning and LoRA, on multilingual abstractive summarization, comparing them with traditional full fine-tuning approach. Our comprehensive analysis highlights the trade-offs between efficiency and performance, providing benchmarks to standardize evaluation in this domain. Additionally, we delve into an in-depth examination of prefix-tuning, particularly under few-shot conditions, uncovering insights into its efficacy and offering guidance for optimizing its performance in multilingual environments. Our findings contribute to a deeper understanding of the applicability and benefits of parameter-efficient fine-tuning methods in multilingual naturallanguageprocessing. This research aims to inform future developments and encourage the adoption of parameter-efficient fine-tuning techniques in multilingual contexts, ultimately enhancing the performance and scalability of pre-trained language models. We have made our source code publicly available at https://***/sgallon-rin/peft-mas.
Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that sho...
详细信息
ISBN:
(纸本)9798891760608
Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method-method and method-human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.
Cognitive dynamics, which refer to the evolution in human cognitive processes, are pivotal to advance human understanding of the world. Recent advancements in large language models (LLMs) highlight their potential for...
详细信息
暂无评论