Vision outlooker improves the performance of vision transformers, which implements a self-attention mechanism by adding an outlook attention, a form of local attention. In naturallanguageprocessing, as has been the ...
详细信息
ISBN:
(纸本)9781956792034
Vision outlooker improves the performance of vision transformers, which implements a self-attention mechanism by adding an outlook attention, a form of local attention. In naturallanguageprocessing, as has been the case in computer vision and other domains, transformer-based models constitute the state-of-the-art for most processing tasks. In this domain, too, many authors have argued and demonstrated the importance of local context. We present an outlook attention mechanism, COOL, for naturallanguageprocessing. COOL, added on top of the self-attention layers of a transformer-based model, encodes local syntactic context considering word proximity and more pair-wise constraints than dynamic convolution used by existing approaches. A comparative empirical performance evaluation of an implementation of COOL with different transformer-based models confirms the opportunity for improvement over a baseline using the original models alone for various naturallanguageprocessing tasks, including question answering. The proposed approach achieves competitive performance with existing state-of-the-art methods on some tasks.
Aligning Large language Models (LLMs) traditionally relies on costly training and human preference annotations. Self-alignment aims to reduce these expenses by aligning models by themselves. To further minimize the co...
详细信息
In the Retrieval-Augmented Generation (RAG) system, advanced Large language Models (LLMs) have emerged as effective Query Likelihood Models (QLMs) in an unsupervised way, which re-rank documents based on the probabili...
详细信息
Large language models (LLMs) and their fine-tuning techniques have demonstrated superior performance in various language understanding and generation tasks. This paper explores fine-tuning LLMs for predicting stock re...
详细信息
Pretraining has proven to be a powerful technique in naturallanguageprocessing (NLP), exhibiting remarkable success in various NLP downstream tasks. However, in the medical domain, existing pretrained models on elec...
详细信息
ISBN:
(纸本)9798891760608
Pretraining has proven to be a powerful technique in naturallanguageprocessing (NLP), exhibiting remarkable success in various NLP downstream tasks. However, in the medical domain, existing pretrained models on electronic health records (EHR) fail to capture the hierarchical nature of EHR data, limiting their generalization capability across diverse downstream tasks using a single pretrained model. To tackle this challenge, this paper introduces a novel, general, and unified pretraining framework called MEDHMP1, specifically designed for hierarchically multimodal EHR data. The effectiveness of the proposed MEDHMP is demonstrated through experimental results on eight downstream tasks spanning three levels. Comparisons against eighteen baselines further highlight the efficacy of our approach.
language Models (LMs) have shown promising performance in naturallanguage generation. However, as LMs often generate incorrect or hallucinated responses, it is crucial to correctly quantify their uncertainty in respo...
详细信息
As long-context large language models (LLMs) gain increasing attention for their ability to handle extensive inputs, the demand for effective evaluation methods has become critical. Existing evaluation methods, howeve...
详细信息
Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios. The potential of the diffusion language model (LM) for...
详细信息
The evaluation of naturallanguage generation (NLG) tasks is a significant and longstanding research area. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based auto...
详细信息
Most efforts in interpreting neural relevance models have focused on local explanations, which explain the relevance of a document to a query but are not useful in predicting the model's behavior on unseen query-d...
详细信息
暂无评论