检索结果-内蒙古大学图书馆

Conference on Computer Vision and pattern recognition (CVPR)

作者： Bin Fu Junjun He Jianjun Wang Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory

Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values. Typically, the FFG approaches follow the style-content disentanglement paradigm, which transfers the target font styles to characters by combining the content representations of source characters and the style codes of reference samples. Most existing methods attempt to increase font generation ability via exploring powerful style representations, which may be a sub-optimal solution for the FFG task due to the lack of modeling spatial transformation in transferring font styles. In this paper, we model font generation as a continuous transformation process from the source character image to the target font image via the creation and dissipation of font pixels, and embed the corresponding transformations into a neural transformation field. With the estimated transformation path, the neural transformation field generates a set of intermediate transformation results via the sampling process, and a font rendering formula is developed to accumulate them into the target font image. Extensive experiments show that our method achieves state-of-the-art performance on few-shot font generation task, which demonstrates the effectiveness of our proposed model. Our implementation is available at: https://***/fubinfb/NTF.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

arXiv

引用

arXiv 2024年

作者： Ding, Yanbo Zhuang, Shaobin Li, Kunchang Yue, Zhengrong Qiao, Yu Wang, Yali Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China School of Artificial Intelligence University of Chinese Academy of Sciences China Shanghai Artificial Intelligence Laboratory China Shanghai Jiao Tong University China

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in the 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES develops a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, existing benchmarks lack detailed descriptions of complex Copyright © 2024, The Authors. All rights reserved.

关键词： 3D modeling

来源：评论

学校读者我要写书评

暂无评论

Offer Proprietary Algorithms Still Protection of Intellectual Property in the Age of Machine Learning?: A Case Study Using Dual Energy CT Data

Offer Proprietary Algorithms Still Protection of Intellectua...

引用

German Workshop on Medical Image Computing, 2022

作者： Maier, Andreas Yang, Seung Hee Maleki, Farhad Muthukrishnan, Nikesh Forghani, Reza Pattern Recognition Lab FAU Erlangen-Nürnberg Nürnberg Germany Department Artificial Intelligence in Medical Engineering FAU Erlangen-Nürnberg Nürnberg Germany McGill University Hospital McGill University Montreal Canada

来源：评论

学校读者我要写书评

暂无评论

Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models

Generate Like Experts: Multi-Stage Font Generation by Incorp...

引用

Conference on Computer Vision and pattern recognition (CVPR)

作者： Bin Fu Fanghua Yu Anran Liu Zixuan Wang Jie Wen Junjun He Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences The University of Hong Kong Harbin Institute of Technology Shenzhen Shanghai Artificial Intelligence Laboratory

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN) to generate target fonts by combining the decoupled content and style representations. The complicated structure and detailed style are simultaneously generated in those methods, which may be the sub-optimal solutions for FFG task. Inspired by most manual font design processes of expert designers, in this paper, we model font generation as a multi-stage generative process. Specifically, as the injected noise and the data distribution in diffusion models can be well-separated into different sub-spaces, we are able to incorporate the font transfer process into these models. Based on this observation, we generalize diffusion methods to modelfont generative process by separating the reverse diffusion process into three stages with different functions: The structure construction stage first generates the structure information for the target character based on the source image, and the font transfer stage subsequently transforms the source font to the target font. Finally, the font refinement stage enhances the appearances and local details of the target font images. Based on the above multi-stage generative process, we construct our font generation framework. named MSD-Font, with a dual-network approach to generate font images. The superior performance demonstrates the effectiveness of our model. The code is available at: https://***/fubinfbIMSD-Font.

关键词： Costs Noise Diffusion processes Transforms Manuals Diffusion models Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Thrombus Detection in Non-contrast Head CT Using Graph Deep Learning

Thrombus Detection in Non-contrast Head CT Using Graph Deep ...

引用

German Workshop on Medical Image Computing, 2022

作者： Popp, Antonia Taubmann, Oliver Thamm, Florian Ditt, Hendrik Maier, Andreas Breininger, Katharina Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Germany Computed Tomography Siemens Healthineers AG Forchheim Germany Department Artificial Intelligence in Biomedical Engineering Friedrich-Alexander-Universität Erlangen-Nürnberg Germany

来源：评论

学校读者我要写书评

暂无评论

Robust Augmentations for Small Object Detection of Aerial Images 7

Robust Augmentations for Small Object Detection of Aerial Im...

引用

7th IEEE International Conference on Network intelligence and Digital Content, IC-NIDC 2021

作者： Xiong, Weiyu Ma, Zhanyu Song, Yi-Zhe Beijing University of Posts and Telecommunications Pattern Recognition and Intelligent Systems Lab School of Artificial Intelligence Beijing100876 China Beijing Academy of Artificial Intelligence Beijing China University of Surrey SketchX CVSSP London United Kingdom

ISBN: (纸本)9781665405829

Object detection is one of the most fundamental but important computer vision tasks. However, small object detection remains an unsolved challenge due to insufficient detailed appearances and additional noises. Meanwhile, aerial images and intelligent transportation systems are under the restriction of difficulties such as dense object arrangement, a large number of small objects, and different perspectives, compared with natural images. To deal with these problems, an adversarial-like data augmentation training is proposed in this paper to narrow the data gap between remote sensing images and natural ones. The difficulty of the remote sensing object detection is verified and analyzed firstly by the classic single-stage anchor-based detector RetinaNet. Then, the multi-scale and data augmentations are introduced to alleviate the mismatch between general detector training and aerial images based on the anchor-free state-of-the-art (SOTA) model FCOS. Experiments on the remote sensing dataset, NWPU VHR10, demonstrate the quasi-antagonism data augmentation method improves the both anchor-based and anchor-free SOTA detectors for natural images with significant margins and shows the effectiveness on aerial images, especially small objects. © 2021 IEEE.

关键词： Remote sensing

来源：评论

学校读者我要写书评

暂无评论

Automatic Classification of Neuromuscular Diseases in Children Using Photoacoustic Imaging

Automatic Classification of Neuromuscular Diseases in Childr...

引用

German Workshop on Medical Image Computing, 2022

作者： Schlereth, Maja Stromer, Daniel Breininger, Katharina Wagner, Alexandra Tan, Lina Maier, Andreas Knieling, Ferdinand Department of Artificial Intelligence in Biomedical Engineering FAU Erlangen-Nürnberg Erlangen Germany Pattern Recognition Lab FAU Erlangen-Nürnberg Erlangen Germany Department of Pediatrics and Adolescent Medicine Universitätsklinik Erlangen FAU Erlangen-Nürnberg Erlangen Germany

来源：评论

学校读者我要写书评

暂无评论

Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients

引用

Machine intelligence Research 2022年

作者： Cheng-Cheng Ma Bao-Yuan Wu Yan-Bo Fan Yong Zhang Zhi-Feng Li National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences School of Artificial Intelligence University of Chinese Academy of Sciences School of Data Science The Chinese University of Hong Kong Shenzhen Research Institute of Big Data AI Lab Tencent Inc.

Adversarial example has been well known as a serious threat to deep neural networks(DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution(GGD) but with different parameters(i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions(e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier(MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Lightweight Landslide Detection Method Based On Depth Separable Convolution And Double Self-Attention Mechanism *

Lightweight Landslide Detection Method Based On Depth Separa...

引用

IEEE International Symposium on Geoscience and Remote Sensing (IGARSS)

作者： Weibin Li Yuhui Kong Rongfang Wang Chunlei Huo Jiawei Chen Yi Niu School of Artificial Intelligence Xidian University Xi’an China Lab. of AI Hangzhou Institute of Technology of Xidian University Hangzhou China National Laboratory of Pattern Recognition Beijing China

The landslide detection methods using remote sensing images are mostly based on the traditional convolutional neural network model with high depth and complexity. The paper proposes a lightweight method based on Depth Separable Convolution and Double Self-Attention Mechanism (DSC-DSAM) for detecting landslides in remote sensing images. This method aims to reduce storage space and improve detection speed while maintaining accuracy. In our model, it starts with using a lightweight convolutional neural network model. Then, the dual self-attention mechanism is applied to improve the accuracy. The proposed method is compared with other existing classification models, and it is shown to have advantages in memory space and detection speed while maintaining accuracy.

关键词：

来源：评论

学校读者我要写书评

暂无评论

DegAE: A New Pretraining Paradigm for Low-Level Vision

DegAE: A New Pretraining Paradigm for Low-Level Vision

引用

Conference on Computer Vision and pattern recognition (CVPR)

作者： Yihao Liu Jingwen He Jinjin Gu Xiangtao Kong Yu Qiao Chao Dong Shanghai Artificial Intelligence Laboratory ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences The University of Sydney

Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：