Proving mathematical theorems using computer-verifiable Formal languages (FL) like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using La...
详细信息
Large language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales. Some methods have proven effective in boosting accuracy by introducing ext...
详细信息
classification in systematic reviews (SRs) is a crucial step in evidence synthesis but is often time-consuming and labour-intensive. This study evaluates the effectiveness of various Machine Learning (ML) models and e...
详细信息
classification in systematic reviews (SRs) is a crucial step in evidence synthesis but is often time-consuming and labour-intensive. This study evaluates the effectiveness of various Machine Learning (ML) models and embedding techniques in automating this process. Five diverse datasets are utilized: Aceves-Martins (2021), comprising 1,258 excluded and 230 included abstracts on the utilization of animal models in depressive behaviour studies;Bannach-Brown (2016), with 896 excluded and 73 included abstracts focusing on the methodological rigour of environmental health systematic reviews;Meijboom (2021), containing 599 excluded and 32 included abstracts on the retransitioning of Etanercept in rheumatic disease patients;Menon (2022), with 896 excluded and 73 included abstracts on environmental health reviews;and a custom Clinical Review Paper Abstract (CRPA) dataset, featuring 500 excluded and 50 included abstracts. A significant research gap in abstract classification has been identified in previous literature, particularly in comparing Large language Models (LLMs) with traditional ML and naturallanguageprocessing (NLP) techniques regarding scalability, adaptability, computational efficiency, and real-time application. Addressing this gap, this study employs GloVe for word embedding via matrix factorization, FastText for character n-gram representation, and Doc2Vec for capturing paragraph-level semantics. A novel Zero-BertXGB technique is introduced, integrating a transformer-based language model, zero-shot learning, and an ML classifier to enhance abstract screening and classification into "Include" or "Exclude" categories. This approach leverages contextual understanding and precision for efficient abstract processing. The Zero-BertXGB technique is compared against other prominent LLMs, including BERT, PaLM, LLaMA, GPT-3.5, and GPT-4, to validate its effectiveness. The Zero-BertXGB model achieved accuracy values of 99.3% for Aceves-Martins2021, 92.6% for Bannach-Br
Hate speech is a complex and subjective phenomenon. In this paper, we present a dataset (GAZE4HATE) that provides gaze data collected in a hate speech annotation experiment. We study whether the gaze of an annotator p...
详细信息
Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly divided into two categories. One is to retrieve from an external knowledge base, and the other is to utilize large language models to generate documents. We propose an iterative retrieval-generation collaborative framework. It is not only able to leverage both parametric and non-parametric knowledge, but also helps to find the correct reasoning path through retrieval-generation interactions, which is very important for tasks that require multi-step reasoning. We conduct experiments on four question answering datasets, including single-hop QA and multi-hop QA tasks. empirical results show that our method significantly improves the reasoning ability of large language models and outperforms previous baselines.
With recommender systems broadly deployed in various online platforms, many efforts have been devoted to learning user preferences and building effective sequential recommenders. However, existing work mainly focuses ...
详细信息
Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectivenes...
详细信息
In the realm of multi-intent spoken language understanding, recent advancements have leveraged the potential of prompt learning frameworks. However, critical gaps exist in these frameworks: the lack of explicit modeli...
详细信息
naturallanguage (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large language Models (LLMs). Yet, b...
详细信息
Extractive summarization is a crucial task in naturallanguageprocessing that aims to condense long documents into shorter versions by directly extracting sentences. The recent introduction of large language models h...
详细信息
ISBN:
(纸本)9798891760615
Extractive summarization is a crucial task in naturallanguageprocessing that aims to condense long documents into shorter versions by directly extracting sentences. The recent introduction of large language models has attracted significant interest in the NLP community due to its remarkable performance on a wide range of downstream tasks. This paper first presents a thorough evaluation of ChatGPT's performance on extractive summarization and compares it with traditional fine-tuning methods on various benchmark datasets. Our experimental analysis reveals that ChatGPT exhibits inferior extractive summarization performance in terms of ROUGE scores compared to existing supervised systems, while achieving higher performance based on LLM-based evaluation metrics. In addition, we explore the effectiveness of in-context learning and chain-of-thought reasoning for enhancing its performance. Furthermore, we find that applying an extract-thengenerate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness. These observations highlight potential directions for enhancing ChatGPT's capabilities in faithful summarization using two-stage approaches.
暂无评论