检索结果-内蒙古大学图书馆

Unified benchmark for zero-shot Turkish text classification

INFORMATION PROCESSING & MANAGEMENT 2023年第3期60卷

作者： celik, Emrecan Dalyan, Tugba Istanbul Bilgi Univ Dept Comp Engn Eski Silahtaraga Elekt Santrali Kazim Karabekir Ca TR-34060 Istanbul Turkiye

Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zeroshot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural language Inference, Next Sentence Prediction and our proposed model that is based on masked language modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.

关键词： Text classification Zero-shot learning Next sentence prediction Natural language inference masked language modeling

来源：评论

学校读者我要写书评

暂无评论

It is all in the [MASK]: Simple instruction-tuning enables BERT-like masked language models as generative classifiers

Natural Language Processing Journal

引用

Natural language Processing Journal 2025年 11卷

作者： Benjamin Clavié Nathan Cooper Benjamin Warner Answer.AI 2093 Philadelphia Pike Suite 3112 Claymont DE 19703 United States of America

While encoder-only models such as BERT and ModernBERT are ubiquitous in real-world NLP applications, their conventional reliance on task-specific classification heads can limit their applicability compared to decoder-based large language models (LLMs). In this work, we introduce ModernBERT-Large-Instruct, a 0.4B-parameter encoder model that leverages its masked language modeling (MLM) head for generative classification. We design a simple approach, extracting all single-token answers from the FLAN dataset collection, and re-purposing standard MLM pre-training to only mask this single token answer. Our approach employs an intentionally simple training loop and inference mechanism that requires no heavy pre-processing, heavily engineered prompting, or architectural modifications. ModernBERT-Large-Instruct exhibits strong zero-shot performance on both classification and knowledge-based tasks, outperforming similarly sized LLMs on MMLU and achieving 93% of Llama3-1B’s MMLU performance with 60% less parameters. We also demonstrate that, when fine-tuned, the generative approach using the MLM head matches or even surpasses traditional classification-head methods across diverse NLU tasks. This capability emerges specifically in models trained on contemporary, diverse data mixes, with models trained on lower volume, less-diverse data yielding considerably weaker performance. Although preliminary, these results demonstrate the potential of using the original generative masked language modeling head over traditional task-specific heads for downstream tasks. Our work suggests that further exploration into this area is warranted, highlighting many avenues for future improvements.

关键词： Zero-shot classification Multiple-choice question answering Encoder models BERT ModernBERT masked language modeling FLAN Instruction-tuning

来源：评论

学校读者我要写书评

暂无评论

Efficient and scalable masked word prediction using concept formation

引用

Cognitive Systems Research 2025年 92卷

作者： Xin Lian Zekun Wang Christopher J. MacLellan Department of Computer Science McCormick School of Engineering and Applied Science Northwestern University 2233 Tech Dr 3rd Floor Evanston 60208 IL United States School of Interactive Computing College of Computing Georgia Institute of Technology 85 5th St NW Atlanta 30332 GA United States

This paper introduces Cobweb/4L, a novel approach for efficient language model learning that supports masked word prediction. The approach builds on Cobweb, an incremental system that learns a hierarchy of probabilistic concepts. Each concept stores the frequencies of words that appear in instances tagged with the concept label. The system utilizes an attribute-value representation to encode words and their context into instances. Cobweb/4L uses an information-theoretic variant of category utility as well as a new performance mechanism that leverages multiple concepts to generate predictions. We demonstrate that its new performance mechanism substantially outperforms prior Cobweb performance mechanisms that use only a single node to generate predictions. Further, we demonstrate that Cobweb/4L outperforms transformer-based language models in a low-data setting by learning more rapidly and achieving better final performance. Lastly, we show that Cobweb/4L, which is hyperparameter-free, is robust across varying scales of training data and does not require any manual tuning. This is in contrast to Word2Vec, which performs best with a varying number of hidden nodes that depend on the total amount of training data; this means its hyperparameters must be manually tuned for different amounts of training data. We conclude by discussing future directions for Cobweb/4L.

关键词： Concept learning masked language modeling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：