咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Less is More: Efficient Model ... 收藏
arXiv

Less is More: Efficient Model Merging with Binary Task Switch

作     者:Qi, Biqing Li, Fangyuan Wang, Zhen Gao, Junqi Li, Dong Ye, Peng Zhou, Bowen 

作者机构:Shanghai Artificial Intelligence Laboratory China Department of Control Science and Engineering Harbin Institute of Technology China School of Mathematics Harbin Institute of Technology China Department of Electronic Engineering Tsinghua University China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Vectors 

摘      要:As an effective approach to equip models with multi-task capabilities without additional training, model merging has garnered significant attention. However, existing methods face challenges of redundant parameter conflicts and the excessive storage burden of parameters. In this work, through controlled experiments, we reveal that for task vectors, only those parameters with magnitudes above a certain threshold contribute positively to the task, exhibiting a pulse-like characteristic. We then attempt leveraging this characteristic to binarize the task vectors and reduce storage overhead. Further controlled experiments show that the binarized task vectors incur almost no decrease in fine-tuning and merging performance, and even exhibit stronger performance improvements as the proportion of redundant parameters increases. Based on these insights, we propose Task Switch (T-Switch), which decomposes task vectors into three components: 1) an activation switch instantiated by a binarized mask vector, 2) a polarity switch instantiated by a binarized sign vector, and 3) a scaling knob instantiated by a scalar coefficient. By storing task vectors in a binarized form, T-Switch alleviates parameter conflicts while ensuring efficient task parameter storage. Furthermore, to enable automated switch combination in T-Switch, we further introduce Auto-Switch, which enables training-free switch combination via retrieval from a small query set. Experiments indicate that our methods achieve significant performance improvements over existing baselines, requiring only 1-3% of the storage space of full-precision parameters. © 2024, CC BY.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分