版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Electrical and Computer Engineering College of Engineering Drexel University PhiladelphiaPA United States College of Computing and Informatics Drexel University PhiladelphiaPA United States Office of Research and Standards Office of Generic Drugs Center for Drug Evaluation and Research United States Food and Drug Administration Silver SpringMD United States School of Biomedical Engineering Science and Health Systems Drexel University PhiladelphiaPA United States
出 版 物:《arXiv》 (arXiv)
年 卷 期:2022年
核心收录:
主 题:Semantics
摘 要:Classification on long-tailed distributed data is a challenging problem, which suffers from serious class-imbalance and hence poor performance on tail classes with only a few samples. Owing to this paucity of samples, learning on the tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task. In this work, we present a simple modification of standard fine-tuning to cope with these challenges. Specifically, we propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning. Our modification has several benefits: (1) it leverages pretrained representations by only fine-tuning a small portion of the model parameters while keeping the rest untouched;(2) it allows the model to learn an initial representation of the specific task;and importantly (3) it protects the learning of tail classes from being at a disadvantage during the model updating. We conduct extensive experiments on synthetic datasets of both two-class and multi-class tasks of text classification as well as a real-world application to ADME (i.e., absorption, distribution, metabolism, and excretion) semantic labeling. The experimental results show that the proposed two-stage fine-tuning outperforms both fine-tuning with conventional loss and fine-tuning with a reweighting loss on the above datasets. © 2022, CC BY.