Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values. Typically, the FFG approaches follow the style-conte...
Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values. Typically, the FFG approaches follow the style-content disentanglement paradigm, which transfers the target font styles to characters by combining the content representations of source characters and the style codes of reference samples. Most existing methods attempt to increase font generation ability via exploring powerful style representations, which may be a sub-optimal solution for the FFG task due to the lack of modeling spatial transformation in transferring font styles. In this paper, we model font generation as a continuous transformation process from the source character image to the target font image via the creation and dissipation of font pixels, and embed the corresponding transformations into a neural transformation field. With the estimated transformation path, the neural transformation field generates a set of intermediate transformation results via the sampling process, and a font rendering formula is developed to accumulate them into the target font image. Extensive experiments show that our method achieves state-of-the-art performance on few-shot font generation task, which demonstrates the effectiveness of our proposed model. Our implementation is available at: https://***/fubinfb/NTF.
Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in the 3D world. To tackle this limitation, we introduce...
详细信息
Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-co...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN) to generate target fonts by combining the decoupled content and style representations. The complicated structure and detailed style are simultaneously generated in those methods, which may be the sub-optimal solutions for FFG task. Inspired by most manual font design processes of expert designers, in this paper, we model font generation as a multi-stage generative process. Specifically, as the injected noise and the data distribution in diffusion models can be well-separated into different sub-spaces, we are able to incorporate the font transfer process into these models. Based on this observation, we generalize diffusion methods to modelfont generative process by separating the reverse diffusion process into three stages with different functions: The structure construction stage first generates the structure information for the target character based on the source image, and the font transfer stage subsequently transforms the source font to the target font. Finally, the font refinement stage enhances the appearances and local details of the target font images. Based on the above multi-stage generative process, we construct our font generation framework. named MSD-Font, with a dual-network approach to generate font images. The superior performance demonstrates the effectiveness of our model. The code is available at: https://***/fubinfbIMSD-Font.
Object detection is one of the most fundamental but important computer vision tasks. However, small object detection remains an unsolved challenge due to insufficient detailed appearances and additional noises. Meanwh...
详细信息
Adversarial example has been well known as a serious threat to deep neural networks(DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of...
Adversarial example has been well known as a serious threat to deep neural networks(DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution(GGD) but with different parameters(i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions(e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier(MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
The landslide detection methods using remote sensing images are mostly based on the traditional convolutional neural network model with high depth and complexity. The paper proposes a lightweight method based on Depth...
The landslide detection methods using remote sensing images are mostly based on the traditional convolutional neural network model with high depth and complexity. The paper proposes a lightweight method based on Depth Separable Convolution and Double Self-Attention Mechanism (DSC-DSAM) for detecting landslides in remote sensing images. This method aims to reduce storage space and improve detection speed while maintaining accuracy. In our model, it starts with using a lightweight convolutional neural network model. Then, the dual self-attention mechanism is applied to improve the accuracy. The proposed method is compared with other existing classification models, and it is shown to have advantages in memory space and detection speed while maintaining accuracy.
Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? Wha...
Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.
暂无评论