Increasing the number of parameters in language models is a common strategy to enhance their performance. However, smaller language models remain valuable due to their lower operational costs. Despite their advantages...
详细信息
Visual text evokes an image in a person's mind, while non-visual text fails to do so. A method to automatically detect visualness in text will enable text-to-image retrieval and generation models to augment text w...
详细信息
ISBN:
(纸本)9798891760608
Visual text evokes an image in a person's mind, while non-visual text fails to do so. A method to automatically detect visualness in text will enable text-to-image retrieval and generation models to augment text with relevant images. This is particularly challenging with long-form text as text-to-image generation and retrieval models are often triggered for text that is designed to be explicitly visual in nature, whereas long-form text could contain many non-visual sentences. To this end, we curate a dataset of 3,620 English sentences and their visualness scores provided by multiple human annotators. We also propose a fine-tuning strategy that adapts large vision-language models like CLIP by modifying the model's contrastive learning objective to map text identified as non-visual to a common NULL image while matching visual text to their corresponding images in the document. We evaluate the proposed approach on its ability to (i) classify visual and non-visual text accurately, and (ii) attend over words that are identified as visual in psycholinguistic studies. empirical evaluation indicates that our approach performs better than several heuristics and baseline models for the proposed task. Furthermore, to highlight the importance of modeling the visualness of text, we conduct qualitative analyses of text-to-image generation systems like DALL-E.
Large language Models (LLMs) have shown remarkable capabilities in various naturallanguageprocessing tasks. However, LLMs may rely on dataset biases as shortcuts for prediction, which can significantly impair their ...
详细信息
The inability to utilise future contexts and the pre-determined left-to-right generation order are major limitations of unidirectional language models. Bidirectionality has been introduced to address those deficiencie...
详细信息
Tool learning aims to enhance and expand large language models' (LLMs) capabilities with external tools, which has gained significant attention *** methods have shown that LLMs can effectively handle a certain amo...
详细信息
The popularity of Large language Models (LLMs) have unleashed a new age of language Agents for solving a diverse range of tasks. While contemporary frontier LLMs are capable enough to power reasonably good language ag...
详细信息
Large language Models (LLMs) have demonstrated remarkable performance in solving math problems, a hallmark of human intelligence. Despite high success rates on current benchmarks;however, these often feature simple pr...
Aligning Large language Models (LLMs) traditionally relies on costly training and human preference annotations. Self-alignment aims to reduce these expenses by aligning models by themselves. To further minimize the co...
详细信息
In the Retrieval-Augmented Generation (RAG) system, advanced Large language Models (LLMs) have emerged as effective Query Likelihood Models (QLMs) in an unsupervised way, which re-rank documents based on the probabili...
详细信息
Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures, due to underrepresentation in training datasets. Various benchmarks have been proposed t...
详细信息
暂无评论