检索结果-内蒙古大学图书馆

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Wang, Ruida Zhang, Jipeng Jia, Yizhen Pan, Rui Diao, Shizhe Pi, Renjie Zhang, Tong Hong Kong University of Science and Technology Hong Kong University of Illinois Urbana-Champaign United States NVIDIA United States

ISBN: (纸本)9798891761643

Proving mathematical theorems using computer-verifiable Formal languages (FL) like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large language Models (LLMs) based on natural language (NL) proofs. However, due to the scarcity of aligned NL and FL theorem-proving data, most modern LLMs exhibit suboptimal performance. This scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs. To address these challenges, this paper proposes TheoremLlama, an end-to-end framework that trains a general-purpose LLM to be a Lean4 expert. TheoremLlama includes NL-FL dataset generation and bootstrapping method to obtain aligned dataset, curriculum learning and block training techniques to train the model, and iterative proof writing method to write Lean4 proofs that work together synergistically. Using the dataset generation method in TheoremLlama, we provide Open Bootstrapped Theorems (OBT), an NL-FL aligned and bootstrapped dataset. Our novel NL-FL bootstrapping method, where NL proofs are integrated into Lean4 code for training datasets, leverages the NL reasoning ability of LLMs for formal reasoning. The TheoremLlama framework achieves cumulative accuracies of 36.48% and 33.61% on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline of 22.95% and 25.41%. Our code, model checkpoints, and the generated dataset is published in GitHub. © 2024 Association for Computational Linguistics.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Advancing Process Verification for Large language Models via Tree-Based Preference Learning

Advancing Process Verification for Large Language Models via...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： He, Mingqian Shen, Yongliang Zhang, Wenqi Tan, Zeqi Lu, Weiming Zhejiang University China

ISBN: (纸本)9798891761643

Large language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales. Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, thereby limiting the effectiveness of the feedback provided. To overcome this limitation, we propose Tree-based Preference Learning Verifier (Tree-PLV), a novel approach that constructs reasoning trees via a best-first search algorithm and collects step-level paired data for preference training. Compared to traditional binary classification, step-level preferences more finely capture the nuances between reasoning steps, allowing for a more precise evaluation of the complete reasoning path. We empirically evaluate Tree-PLV across a range of arithmetic and commonsense reasoning tasks, where it significantly outperforms existing benchmarks. For instance, Tree-PLV achieved substantial performance gains over the Mistral-7B self-consistency baseline on GSM8K (67.55% → 82.79%), MATH (17.00% → 26.80%), CSQA (68.14% → 72.97%), and StrategyQA (82.86% → 83.25%). Additionally, our study explores the appropriate granularity for applying preference learning, revealing that step-level guidance provides feedback that better aligns with the evaluation of the reasoning process. © 2024 Association for Computational Linguistics.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Zero-BertXGB: An empirical Technique for Abstract Classification in Systematic Reviews

引用

IEEE ACCESS 2025年 13卷 18418-18440页

作者： Islam, Mohammad Shariful Rony, Mohammad Abu Tareq Hossain, Md Rasel Alshathri, Samah El-Shafai, Walid Noakhali Sci & Technol Univ Dept Comp Sci & Telecommun Engn Noakhali 3814 Bangladesh Noakhali Sci & Technol Univ Dept Stat Noakhali 3814 Bangladesh Princess Nourah bint Abdulrahman Univ Coll Comp & Informat Sci Dept Informat Technol POB 84428 Riyadh 11671 Saudi Arabia Prince Sultan Univ ASSCL Comp Sci Dept Riyadh 11586 Saudi Arabia Menoufia Univ Fac Elect Engn Dept Elect & Elect Commun Engn Menoufia 32952 Egypt

classification in systematic reviews (SRs) is a crucial step in evidence synthesis but is often time-consuming and labour-intensive. This study evaluates the effectiveness of various Machine Learning (ML) models and embedding techniques in automating this process. Five diverse datasets are utilized: Aceves-Martins (2021), comprising 1,258 excluded and 230 included abstracts on the utilization of animal models in depressive behaviour studies;Bannach-Brown (2016), with 896 excluded and 73 included abstracts focusing on the methodological rigour of environmental health systematic reviews;Meijboom (2021), containing 599 excluded and 32 included abstracts on the retransitioning of Etanercept in rheumatic disease patients;Menon (2022), with 896 excluded and 73 included abstracts on environmental health reviews;and a custom Clinical Review Paper Abstract (CRPA) dataset, featuring 500 excluded and 50 included abstracts. A significant research gap in abstract classification has been identified in previous literature, particularly in comparing Large language Models (LLMs) with traditional ML and natural language processing (NLP) techniques regarding scalability, adaptability, computational efficiency, and real-time application. Addressing this gap, this study employs GloVe for word embedding via matrix factorization, FastText for character n-gram representation, and Doc2Vec for capturing paragraph-level semantics. A novel Zero-BertXGB technique is introduced, integrating a transformer-based language model, zero-shot learning, and an ML classifier to enhance abstract screening and classification into "Include" or "Exclude" categories. This approach leverages contextual understanding and precision for efficient abstract processing. The Zero-BertXGB technique is compared against other prominent LLMs, including BERT, PaLM, LLaMA, GPT-3.5, and GPT-4, to validate its effectiveness. The Zero-BertXGB model achieved accuracy values of 99.3% for Aceves-Martins2021, 92.6% for Bannach-Br

关键词： Biological system modeling Systematic literature review Accuracy Transformers Encoding Bidirectional control Nearest neighbor methods Detectors Databases Classification algorithms Abstract classification machine learning natural language processing zero-BertXGB methods large language models

来源：评论

学校读者我要写书评

暂无评论

Eyes Don't Lie: Subjective Hate Annotation and Detection with Gaze

Eyes Don't Lie: Subjective Hate Annotation and Detection wit...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Alaçam, Özge Hoeken, Sanne Zarrieß, Sina Department of Linguistics Bielefeld University Germany Center for Information and Language Processing LMU Munich Germany

ISBN: (纸本)9798891761643

Hate speech is a complex and subjective phenomenon. In this paper, we present a dataset (GAZE4HATE) that provides gaze data collected in a hate speech annotation experiment. We study whether the gaze of an annotator provides predictors of their subjective hatefulness rating, and how gaze features can improve Hate Speech Detection (HSD). We conduct experiments on statistical modeling of subjective hate ratings and gaze and analyze to what extent rationales derived from hate speech models correspond to human gaze and explanations in our data. Finally, we introduce MEANION, a first gaze-integrated HSD model. Our experiments show that particular gaze features like dwell time or fixation counts systematically correlate with annotators' subjective hate rating, and improve predictions of text-only hate speech models. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

RETRIEVAL-GENERATION SYNERGY AUGMENTED LARGE language MODELS 49

RETRIEVAL-GENERATION SYNERGY AUGMENTED LARGE LANGUAGE MODELS

引用

49th IEEE International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Feng, Zhangyin Feng, Xiaocheng Zhao, Dezhi Yang, Maojin Qin, Bing Harbin Inst Technol Harbin Peoples R China Pengcheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350344868;9798350344851

Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly divided into two categories. One is to retrieve from an external knowledge base, and the other is to utilize large language models to generate documents. We propose an iterative retrieval-generation collaborative framework. It is not only able to leverage both parametric and non-parametric knowledge, but also helps to find the correct reasoning path through retrieval-generation interactions, which is very important for tasks that require multi-step reasoning. We conduct experiments on four question answering datasets, including single-hop QA and multi-hop QA tasks. empirical results show that our method significantly improves the reasoning ability of large language models and outperforms previous baselines.

关键词： large language models retrieval augmented question answering

来源：评论

学校读者我要写书评

暂无评论

AuriSRec: Adversarial User Intention Learning in Sequential Recommendation

AuriSRec: Adversarial User Intention Learning in Sequential ...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Zhang, Junjie Xie, RuoBing Sun, Wenqi Lin, Leyu Zhao, Wayne Xin Wen, Ji-Rong Gaoling School of Artificial Intelligence Renmin University of China China Beijing Key Laboratory of Big Data Management and Analysis Methods China Tencent China

ISBN: (纸本)9798891761681

With recommender systems broadly deployed in various online platforms, many efforts have been devoted to learning user preferences and building effective sequential recommenders. However, existing work mainly focuses on capturing user implicit preferences from historical interactions and simply matching them with the next behavior, instead of predicting user explicit intentions. This may lead to inappropriate recommendations. In light of this issue, we propose the adversarial user intention learning approach for sequential recommendation, named AuriSRec. The major novelty of our approach is to explicitly predict user current intentions when making recommendations, by inferring their decision-making process as explained in target reviews (reviews written after interacting with the ground-truth item). Specifically, AuriSRec conducts adversarial learning between an intention generator and a discriminator. The generator predicts user intentions by taking their historical reviews and behavioral sequences as inputs, while target reviews provide guidance. Beyond typical sequential modeling methods in the field of natural language process (NLP), a decoupling-based review encoder and a hybrid attention fusion mechanism are introduced to filter noise and enhance the generation capacity. On the other hand, the discriminator determines whether the intention is generated or real based on their matching degree to the target item, thereby guiding the generator to produce gradually improved intentions. Extensive experiments on five datasets demonstrate the effectiveness of our approach. © 2024 Association for Computational Linguistics.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

ApiQ: Finetuning of 2-Bit Quantized Large language Model

ApiQ: Finetuning of 2-Bit Quantized Large Language Model

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Liao, Baohao Herold, Christian Khadivi, Shahram Monz, Christof Language Technology Lab University of Amsterdam Netherlands eBay Inc. Aachen Germany

ISBN: (纸本)9798891761643

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework named ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. This approach ensures the maintenance of the original LLM's activation precision while mitigating the error propagation from shallower into deeper layers. Through comprehensive evaluations conducted on a spectrum of language tasks with various LLMs, ApiQ demonstrably minimizes activation error during quantization. Consequently, it consistently achieves superior finetuning results across various bit-widths. Notably, one can even finetune a 2-bit Llama-2-70b with ApiQ on a single NVIDIA A100-80GB GPU without any memory-saving techniques, and achieve promising results. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

DC-Instruct: An Effective Framework for Generative Multi-intent Spoken language Understanding

DC-Instruct: An Effective Framework for Generative Multi-int...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Xing, Bowen Liao, Lizi Huang, Minlie Tsang, Ivor W. Beijing Key Laboratory of Knowledge Engineering for Materials Science School of Computer and Communication Engineering University of Science and Technology Beijing China Singapore Management University Singapore Group Tsinghua University China CFAR and IHPC Agency for Science Technology and Research Singapore College of Computing and Data Science Nanyang Technological University Singapore

ISBN: (纸本)9798891761643

In the realm of multi-intent spoken language understanding, recent advancements have leveraged the potential of prompt learning frameworks. However, critical gaps exist in these frameworks: the lack of explicit modeling of dual-task dependencies and the oversight of task-specific semantic differences among utterances. To address these shortcomings, we propose DC-Instruct, a novel generative framework based on Dual-task Inter-dependent Instructions (DII) and Supervised Contrastive Instructions (SCI). Specifically, DII guides large language models (LLMs) to generate labels for one task based on the other task's labels, thereby explicitly capturing dual-task inter-dependencies. Moreover, SCI leverages utterance semantics differences by guiding LLMs to determine whether a pair of utterances share the same or similar labels. This can improve LLMs on extracting and discriminating task-specific semantics, thus enhancing their SLU reasoning abilities. Extensive experiments on public benchmark datasets show that DC-Instruct markedly outperforms current generative models and state-of-the-art methods, demonstrating its effectiveness in enhancing dialogue language understanding and reasoning. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Beyond natural language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

Beyond Natural Language: LLMs Leveraging Alternative Formats...

引用

2024 conference on empirical methods in natural language processing, EMNLP 2024

作者： Chen, Weize Yuan, Chenfei Yuan, Jiarui Su, Yusheng Qian, Chen Yang, Cheng Xie, Ruobing Liu, Zhiyuan Sun, Maosong Tsinghua University China Tencent China Beijing University of Posts and Telecommunications China

ISBN: (纸本)9798891761681

natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7% improvement in reasoning efficiency for different LLMs, and up to a 72.7% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication. Our code is released at https://***/thunlp/AutoForm. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Extractive Summarization via ChatGPT for Faithful Summary Generation

Extractive Summarization via ChatGPT for Faithful Summary Ge...

引用

conference on empirical methods in natural language processing (EMNLP)

作者： Zhang, Haopeng Liu, Xiao Zhang, Jiawei Univ Calif Davis Dept Comp Sci IFM Lab Davis CA 95616 USA

ISBN: (纸本)9798891760615

Extractive summarization is a crucial task in natural language processing that aims to condense long documents into shorter versions by directly extracting sentences. The recent introduction of large language models has attracted significant interest in the NLP community due to its remarkable performance on a wide range of downstream tasks. This paper first presents a thorough evaluation of ChatGPT's performance on extractive summarization and compares it with traditional fine-tuning methods on various benchmark datasets. Our experimental analysis reveals that ChatGPT exhibits inferior extractive summarization performance in terms of ROUGE scores compared to existing supervised systems, while achieving higher performance based on LLM-based evaluation metrics. In addition, we explore the effectiveness of in-context learning and chain-of-thought reasoning for enhancing its performance. Furthermore, we find that applying an extract-thengenerate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness. These observations highlight potential directions for enhancing ChatGPT's capabilities in faithful summarization using two-stage approaches.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：