咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Create Your World: Lifelong Te... 收藏
arXiv

Create Your World: Lifelong Text-to-Image Diffusion

作     者:Sun, Gan Liang, Wenqi Dong, Jiahua Li, Jun Ding, Zhengming Cong, Yang 

作者机构:State Key Laboratory of Robotics Shenyang Institute of Automation Institutes for Robotics and Intelligent Manufacturing Chinese Academy of Sciences Shenyang110016 China The School of Computer Science and Engineering Nanjing University of Science and Technology Jiangsu210094 China The Department of Computer Science Tulane University New OrleansLA70118 United States The College of Automation Science and Engineering South China University of Technology Guangzhou510640 China University of Chinese Academy of Sciences Beijing100049 China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2023年

核心收录:

主  题:Semantics 

摘      要:Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in this work study the problem of synthesizing instantiations of a user s own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a Lifelong text-to-image Diffusion Model (L2DM), which intends to overcome knowledge catastrophic forgetting for the past encountered concepts, and semantic catastrophic neglecting for one or more concepts in the text prompt. In respect of knowledge catastrophic forgetting, our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic catastrophic neglecting is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://***/. Copyright © 2023, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分