text-to-image generation models generate photo-realistic images from textual descriptions, typically using GANs and BiLSTM networks. However, as input text sequence length increases, these models suffer from a loss of...
详细信息
ISBN:
(数字)9783031585357
ISBN:
(纸本)9783031585340;9783031585357
text-to-image generation models generate photo-realistic images from textual descriptions, typically using GANs and BiLSTM networks. However, as input text sequence length increases, these models suffer from a loss of information, leading to missed keywords and unsatisfactory results. To address this, we propose an attentional GAN (AttnGAN) model with a text attention mechanism. We evaluate AttnGAN variants on the MS-COCO dataset qualitatively and quantitatively. For the image quality analysis, we utilize performance measures such as FID score, R-precision, and IS score. Our results show that the proposed model outperforms existing approaches, producing more realistic images by preserving vital information in the input sequence.
暂无评论