检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Bigoulaeva, Irina Madabushi, Harish Tayyar Gurevych, Iryna Ubiquitous Knowledge Processing Lab Technical University of Darmstadt Germany Department of Computer Science The University of Bath United Kingdom

Large Language Models (LLMs), trained on extensive web-scale corpora, have demonstrated remarkable abilities across diverse tasks, especially as they are scaled up. Nevertheless, even state-of-the-art models struggle in certain cases, sometimes failing at problems solvable by young children, indicating that traditional notions of task complexity are insufficient for explaining LLM capabilities. However, exploring LLM capabilities is complicated by the fact that most widely-used models are also 'instruction-tuned' to respond appropriately to prompts. With the goal of disentangling the factors influencing LLM performance, we investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. Through extensive experiments across various model families, scales and task types, which included instruction tuning 90 different LLMs, we demonstrate that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. By clarifying what instruction-tuning contributes, we extend prior research into in-context learning, which suggests that base models use priors from pretraining data to solve tasks. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve, with the added influence of the instruction-tuning dataset. © 2025, CC BY.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

ArgumenText: Argument Classification and Clustering in a Generalized Search Scenario

引用

Datenbank-Spektrum 2020年第2期20卷 115-121页

作者： Daxenberger, Johannes Schiller, Benjamin Stahlhut, Chris Kaiser, Erik Gurevych, Iryna Ubiquitous Knowledge Processing Lab Department of Computer Science Technische Universität Darmstadt Darmstadt Germany

The ArgumenText project creates argument mining technology for big and heterogeneous data and aims to evaluate its use in real-world applications. The technology mines and clusters arguments from a variety of textual sources for a large range of topics and in multiple languages. Its main strength is its generalization to very different textual sources including web crawls, news data, or customer reviews. We validated the technology with a focus on supporting decisions in innovation management as well as customer feedback analysis. Along with its public argument search engine and API, ArgumenText has released multiple datasets for argument classification and clustering. This contribution outlines the major technology-related challenges and proposed solutions for the tasks of argument extraction from heterogeneous sources and argument clustering. It also lays out exemplary industry applications and remaining challenges. © 2020, The Author(s).

关键词： Search engines

来源：评论

学校读者我要写书评

暂无评论

Can we hide in the web? Large scale simultaneous age and gender author profiling in social media: Notebook for PAN at CLEF 2013

Can we hide in the web? Large scale simultaneous age and gen...

引用

2013 Cross Language Evaluation Forum Conference, CLEF 2013

作者： Flekova, Lucie Gurevych, Iryna Ubiquitous Knowledge Processing Lab Department of Computer Science Technische Universität Darmstadt Germany Ubiquitous Knowledge Processing Lab German Institute for Educational Research and Educational Information Germany

Would you target your audience differently, knowing the real age and gender of the text authors on your website forum? This paper examines hundreds of thousands of online documents, e.g. chat lines or blog posts, showing that computers are capable to address this task better than humans, without relying on content stereotypes. Pointing out that age and gender profiling are not independent problems, we approach the task as a multiclass classification problem, combining the age and gender information to define six classes. Utilizing a wide range of stylistic and content features and a large number of readability measures we demonstrate the high predictive abilities of the parts of speech, the punctuation and the amount of emotions and slang used in the text, independently of the topic discussed.

关键词： Social networking (online)

来源：评论

学校读者我要写书评

暂无评论

引用

2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013

作者： Zesch, Torsten Levy, Omer Gurevych, Iryna Dagan, Ido Ubiquitous Knowledge Processing Lab Computer Science Department Technische Universität Darmstadt Germany Natural Language Processing Lab Computer Science Department Bar-Ilan University Israel

ISBN: (纸本)9781937284497

Our system combines text similarity measures with a textual entailment system. In the main task, we focused on the influence of lexicalized versus unlexicalized features, and how they affect performance on unseen questions and domains. We also participated in the pilot partial entailment task, where our system significantly outperforms a strong baseline. c 2013 Association for Computational Linguistics

关键词：

来源：评论

学校读者我要写书评

暂无评论

Modeling extractive sentence intersection via subtree entailment 26

Modeling extractive sentence intersection via subtree entail...

引用

26th International Conference on Computational Linguistics, COLING 2016

作者： Levy, Omer Dagan, Ido Stanovsky, Gabriel Eckle-Kohler, Judith Gurevych, Iryna Computer Science Department Bar-Ilan University Israel Ubiquitous Knowledge Processing Lab. Technische Universität Darmstadt Germany Ubiquitous Knowledge Processing Lab. German Institute for Educational Research Germany

ISBN: (纸本)9784879747020

Sentence intersection captures the semantic overlap of two texts, generalizing over paradigms such as textual entailment and semantic text similarity. Despite its modeling power, it has received little attention because it is difficult for non-experts to annotate. We analyze 200 pairs of similar sentences and identify several underlying properties of sentence intersection. We leverage these insights to design an algorithm that decomposes the sentence intersection task into several simpler annotation tasks, facilitating the construction of a high quality dataset via crowdsourcing. We implement this approach and provide an annotated dataset of 1,764 sentence intersections. © 1963-2018 ACL.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting 29

DP-Rewrite: Towards Reproducibility and Transparency in Diff...

引用

29th International Conference on Computational Linguistics, COLING 2022

作者： Igamberdiev, Timour Arnold, Thomas Habernal, Ivan Trustworthy Human Language Technologies Ubiquitous Knowledge Processing Lab Department of Computer Science Technical University of Darmstadt Germany

Text rewriting with differential privacy (DP) provides concrete theoretical guarantees for protecting the privacy of individuals in textual documents. In practice, existing systems may lack the means to validate their privacy-preserving claims, leading to problems of transparency and reproducibility. We introduce DP-Rewrite, an open-source framework for differentially private text rewriting which aims to solve these problems by being modular, extensible, and highly customizable. Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics to provide a flexible way to lead and validate private text rewriting research. To demonstrate our software in practice, we provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach. Our system is publicly available, and we hope that it will help the community to make DP text rewriting research more accessible and transparent. © 2022 Proceedings - International Conference on Computational Linguistics, COLING. All rights reserved.

关键词： Transparency

来源：评论

学校读者我要写书评

暂无评论

Recognizing partial textual entailment

Recognizing partial textual entailment

引用

51st Annual Meeting of the Association for Computational Linguistics, ACL 2013

作者： Levy, Omer Zesch, Torsten Dagan, Ido Gurevych, Iryna Natural Language Processing Lab. Computer Science Department Bar-Ilan University Israel Ubiquitous Knowledge Processing Lab. Computer Science Department Technische Universität Darmstadt Germany

ISBN: (纸本)9781937284510

Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is "almost entailed" by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for recognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment. © 2013 Association for Computational Linguistics.

关键词： Text processing

来源：评论

学校读者我要写书评

暂无评论

Approximate matching for evaluating keyphrase extraction

Approximate matching for evaluating keyphrase extraction

引用

International Conference on Recent Advances in Natural Language processing, RANLP-2009

作者： Zesch, Torsten Gurevych, Iryna Ubiquitous Knowledge Processing Lab. Computer Science Department Technische Universität Darmstadt D-64289 Darmstadt Germany

We propose a new evaluation strategy for keyphrase extraction based on approximate keyphrase matching. It corresponds well with human judgments and is better suited to assess the performance of keyphrase extraction approaches. Additionally, we propose a generalized framework for comprehensive analysis of keyphrase extraction that subsumes most existing approaches, which allows for fair testing conditions. For the first time, we compare the results of state-of-the-art unsupervised and supervised keyphrase extraction approaches on three evaluation datasets and show that the relative performance of the approaches heavily depends on the evaluation metric as well as on the properties of the evaluation dataset.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

The People’s Web Meets NLP 1

引用

丛书名： Theory and Applications of Natural Language processing

1000年

作者： Iryna Gurevych Jungi Kim

来源：评论

学校读者我要写书评

暂无评论

A tool for extracting sense-disambiguated example sentences through user feedback 15

A tool for extracting sense-disambiguated example sentences ...

引用

Software Demonstrations at the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017

作者： Boullosa, Beto De Castilho, Richard Eckart Geyken, Alexander Lemnitzer, Lothar Gurevych, Iryna Ubiquitous Knowledge Processing Lab Department of Computer Science Technische Universität Darmstadt Germany Berlin-Brandenburg Academy of Sciences Germany

ISBN: (纸本)9781510838604

This paper describes an application system aimed to help lexicographers in the extraction of example sentences for a given headword based on its different senses. The tool uses classification and clustering methods and incorporates user feedback to refine its results. © 2017 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：