We present LOCOVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LOCOVQA augments test examples for mathematical reasoning, VQA, and character recogn...
详细信息
This paper introduces a novel generalized self-imitation learning (GSIL) framework, which effectively and efficiently aligns large language models with offline demonstration data. We develop GSIL by deriving a surroga...
详细信息
Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model g...
详细信息
In recent studies, researchers have used large language models (LLMs) to explore semantic representations in the brain;however, they have typically assessed different levels of semantic content, such as speech, object...
详细信息
Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respec...
详细信息
Sign words are the building blocks of any sign *** this work, we present wSignGen, a word-conditioned 3D American Sign language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate mot...
详细信息
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and...
详细信息
ISBN:
(纸本)9798891760608
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and fine-tuning datasets. Hence, the strong performance of PLMs may rely on the parametric knowledge that is memorized during pre-training and fine-tuning. Moreover, the knowledge memorized by PLMs may quickly become outdated, which affects the generalization performance of PLMs on future data. In this work, we propose TEMPOSUM, a novel benchmark that contains data samples from 2010 to 2022, to understand the temporal generalization ability of abstractive summarization models. Through extensive human evaluation, we show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data. Moreover, existing faithfulness enhancement methods cannot reliably improve the faithfulness of summarization models on future data. Finally, we discuss several recommendations to the research community on how to evaluate and improve the temporal generalization capability of text summarization models.(1)
The Arabic language role in actual global affairs entails sophisticated naturallanguageprocessing techniques, especially in text classification. This paper presents Tasneef as a novel hybrid approach to tackle compu...
详细信息
The Arabic language role in actual global affairs entails sophisticated naturallanguageprocessing techniques, especially in text classification. This paper presents Tasneef as a novel hybrid approach to tackle computational challenges by reducing memory usage and runtime overhead for actual Arabic text classification (ATC). Tasneef integrates distance-based meta-features (DBMFs) representation with word embeddings. This integration is useful because using a single text representation technique can be limiting in capturing the essential range of features necessary for effective classification, especially in complex languages like Arabic. By addressing the intricacies arising from the high dimensionality and sparsity inherent in Term Frequency-Inverse Document Frequency (TF-IDF) representation, the utilization of DBMFs is shown to offer a promising solution. The DBMFs rely on document labels and statistical features to establish meaningful distance relationships between documents, thereby facilitating effective reduction. Furthermore, word embeddings encapsulate semantic attributes. empirical assessments reveal a significant reduction of two orders of magnitude in both memory usage and runtime. This reduction translates to memory savings ranging from 158x to 361x and runtime reductions from 120x to 524x across three popular datasets;maintaining comparable MicroF1 and MacroF1 values, while notably reducing learning time. Moreover, Tasneef outperforms ten state-of-the-art deep learning models and seven dimension reduction methods in accuracy, with enhancements ranging from 0.3% to 39.6%;and F-Measure, with improvements from 4.6% to 26.8%, across four additional datasets. These findings highlight Tasneef as a promising solution for diverse ATC applications in real-world scenarios, offering concise and rapid classification with reduced computational learning costs.
Instruction tuning aims to align large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instruc...
详细信息
naturallanguageprocessing (NLP) is a common application for Artificial Intelligence. The goal is to provide language teachers with a simple to apply tool for topic model analyses to integrate into their classroom. T...
详细信息
ISBN:
(纸本)9783031519789;9783031519796
naturallanguageprocessing (NLP) is a common application for Artificial Intelligence. The goal is to provide language teachers with a simple to apply tool for topic model analyses to integrate into their classroom. The project also involves project based learning for students programming the actual AI web application. The original notion is to provide language teacher with AI methodology without requiring any technical knowledge in AI or any programming skills. naturallanguageprocessing provides various tools for word frequencies, but also topic modelling, allowing to track the relevance of topics over time in the media or in the literature. In collaboration with University of Technology linguistics, we intend to provide a corpus of classical English and German literature, as well as the option of uploading your own corpus which can be obtained from webscraping or other sources. A team of students of the vocational high school TGM Wien specialised in IT and Software Development is working on the design of the interactive GUI for this NLP application, learning in this way the methods of naturallanguageprocessing and Artrificial Intelligence in a project based setting. For this the statistical programming language R is utilized which already provides packages with implementation for naturallanguageprocessing and in addition the shiny package which allows to develop interactie web apps without additional web and app programming. A team of teachers supervises and supports the students during the development process, providing expertise in AI and NLP, in web and app programming, as well as server management. Two intended outcomes exist. Ont the one hand, we want our students to learn naturallanguageprocessing first hand through development of this application. On the other hand, we intend to obtain an interactive AI tool which can assist language teachers and their students on the long term in the classroom. In times of GPT3 and GPT4 dominating the media and per
暂无评论