Vector Quantized Variational AutoEncoder (VQ-VAE) models realize fast image generation by encoding and quantifying the raw input in the single-level or hierarchical compressed latent space. However, the learned repres...
详细信息
ISBN:
(纸本)9781450386517
Vector Quantized Variational AutoEncoder (VQ-VAE) models realize fast image generation by encoding and quantifying the raw input in the single-level or hierarchical compressed latent space. However, the learned representations are not expert in capturing complex relations existed, while one usually adopts domain-specific autoregressive models to fit a prior distribution for two stages of learning. In this work, we propose VQMG, a novel and unified framework for multi-hops relational reasoning and explicit representation learning. By introducing multi-hops graph convolution networks (MGCN), complicated relations from hierarchical latent space are effectively captured by Inner graph, while the fitting of autoregressive prior are performed coherently by Outer graph to promote the performance. Experiments on multimedia tasks including Point cloud segementation, Stroke-level text detection and Image generation verify the efficiency and applicability of our approach.
暂无评论