咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Associative memory inspires im... 收藏
arXiv

Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

作     者:Burns, Thomas F. Fukai, Tomoki Earls, Christopher J. 

作者机构:SciAI Center Cornell University United States Neural Coding and Brain Computing Unit OIST Japan SciAI Center Center for Applied Mathematics School of Civil and Environmental Engineering Cornell University United States 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Associative storage 

摘      要:Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite this, a critical component within LLMs, the attention mechanism, resembles modern associative memory models, widely used in and influenced by the computational neuroscience community to model biological memory systems. Using this connection, we introduce an associative memory model capable of performing ICL. We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification. We then apply our architecture in small language models with 8 million parameters, focusing on attention head values, with results also indicating improved ICL performance at this larger and more naturalistic *** Codes 92B20, 68T01, 68T37, 68T50 © 2024, CC BY.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分