检索结果-内蒙古大学图书馆

17th European Conference on computer vision, ECCV 2022

作者： Zhou, Lin Cai, Haoming Gu, Jinjin Li, Zheyuan Liu, Yingqi Chen, Xiangyu Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China Shanghai AI Laboratory Shanghai China The University of Sydney Sydney Australia University of Macau Zhuhai China

ISBN: (纸本)9783031250620

The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of the attention branch, (2) replacing large dense convolution kernels with depthwise separable convolutions, and (3) introducing pixel normalization. These approaches paint a clear evolutionary roadmap for the design of attention mechanisms. Based on these observations, we propose VapSR, the Vast-receptive-field Pixel attention network. Experiments demonstrate the superior performance of VapSR. VapSR outperforms the present lightweight networks with even fewer parameters. And the light version of VapSR can use only 21.68% and 28.18% parameters of IMDB and RFDN to achieve similar performances to those networks. The code and models are available at https://***/zhoumumu/VapSR. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

UDC-UNet: Under-Display Camera Image Restoration via U-shape Dynamic Network 17th

UDC-UNet: Under-Display Camera Image Restoration via U-shap...

引用

17th European Conference on computer vision, ECCV 2022

作者： Liu, Xina Hu, Jinfan Chen, Xiangyu Dong, Chao Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Beijing China University of Chinese Academy of Sciences Beijing China University of Macau Zhuhai China Shanghai AI Laboratory Shanghai China

ISBN: (纸本)9783031250712

Under-Display Camera (UDC) has been widely exploited to help smartphones realize full-screen displays. However, as the screen could inevitably affect the light propagation process, the images captured by the UDC system usually contain flare, haze, blur, and noise. Particularly, flare and blur in UDC images could severely deteriorate the user experience in high dynamic range (HDR) scenes. In this paper, we propose a new deep model, namely UDC-UNet, to address the UDC image restoration problem with an estimated PSF in HDR scenes. Our network consists of three parts, including a U-shape base network to utilize multi-scale information, a condition branch to perform spatially variant modulation, and a kernel branch to leverage the prior knowledge of the PSF. According to the characteristics of HDR data, we additionally design a tone mapping loss to stabilize network optimization and achieve better visual quality. Experimental results show that the proposed UDC-UNet outperforms the state-of-the-art methods in quantitative and qualitative comparisons. Our approach won second place in the UDC image restoration track of the MIPI challenge. Codes and models are available at https://***/J-FHu/UDCUNet. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Finding discriminative filters for specific degradations in blind super-resolution 21

Finding discriminative filters for specific degradations in ...

引用

Proceedings of the 35th International Conference on Neural Information Processing Systems

作者： Liangbin Xie Xintao Wang Chao Dong Zhongang Qi Ying Shan Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and University of Chinese Academy of Sciences and ARC Lab Tencent PCG ARC Lab Tencent PCG Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and Shanghai AI Laboratory Shanghai China

ISBN: (纸本)9781713845393

Recent blind super-resolution (SR) methods typically consist of two branches, one for degradation prediction and the other for conditional restoration. However, our experiments show that a one-branch network can achieve comparable performance to the two-branch scheme. Then we wonder: how can one-branch networks automatically learn to distinguish degradations? To find the answer, we propose a new diagnostic tool – Filter Attribution method based on Integral Gradient (FAIG). Unlike previous integral gradient methods, our FAIG aims at finding the most discriminative filters instead of input pixels/features for degradation removal in blind SR networks. With the discovered filters, we further develop a simple yet effective method to predict the degradation of an input image. Based on FAIG, we show that, in one-branch blind SR networks, 1) we are able to find a very small number of (1%) discriminative filters for each specific degradation; 2) The weights, locations and connections of the discovered filters are all important to determine the specific network function. 3) The task of degradation prediction can be implicitly realized by these discriminative filters without explicit supervised learning. Our findings can not only help us better understand network behaviors inside one-branch blind SR networks, but also provide guidance on designing more efficient architectures and diagnosing networks for blind SR.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Neural Transformation Fields for Arbitrary-Styled Font Generation

Neural Transformation Fields for Arbitrary-Styled Font Gener...

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Bin Fu Junjun He Jianjun Wang Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory

Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values. Typically, the FFG approaches follow the style-content disentanglement paradigm, which transfers the target font styles to characters by combining the content representations of source characters and the style codes of reference samples. Most existing methods attempt to increase font generation ability via exploring powerful style representations, which may be a sub-optimal solution for the FFG task due to the lack of modeling spatial transformation in transferring font styles. In this paper, we model font generation as a continuous transformation process from the source character image to the target font image via the creation and dissipation of font pixels, and embed the corresponding transformations into a neural transformation field. With the estimated transformation path, the neural transformation field generates a set of intermediate transformation results via the sampling process, and a font rendering formula is developed to accumulate them into the target font image. Extensive experiments show that our method achieves state-of-the-art performance on few-shot font generation task, which demonstrates the effectiveness of our proposed model. Our implementation is available at: https://***/fubinfb/NTF.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models

Generate Like Experts: Multi-Stage Font Generation by Incorp...

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Bin Fu Fanghua Yu Anran Liu Zixuan Wang Jie Wen Junjun He Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences The University of Hong Kong Harbin Institute of Technology Shenzhen Shanghai Artificial Intelligence Laboratory

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN) to generate target fonts by combining the decoupled content and style representations. The complicated structure and detailed style are simultaneously generated in those methods, which may be the sub-optimal solutions for FFG task. Inspired by most manual font design processes of expert designers, in this paper, we model font generation as a multi-stage generative process. Specifically, as the injected noise and the data distribution in diffusion models can be well-separated into different sub-spaces, we are able to incorporate the font transfer process into these models. Based on this observation, we generalize diffusion methods to modelfont generative process by separating the reverse diffusion process into three stages with different functions: The structure construction stage first generates the structure information for the target character based on the source image, and the font transfer stage subsequently transforms the source font to the target font. Finally, the font refinement stage enhances the appearances and local details of the target font images. Based on the above multi-stage generative process, we construct our font generation framework. named MSD-Font, with a dual-network approach to generate font images. The superior performance demonstrates the effectiveness of our model. The code is available at: https://***/fubinfbIMSD-Font.

关键词： Costs Noise Diffusion processes Transforms Manuals Diffusion models Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Tile selection method based on error minimization for photomosaic image creation

引用

Frontiers of computer Science 2021年第3期15卷 165-172页

作者： Hongbo ZHANG Xin GAO Jixiang DU Qing LEI Lijie YANG Department of Computer Science and Technology Huaqiao UniversityXiamen 361021China Fujian Key Laboratory of Big Data Intelligence and Security Huaqiao UniversityXiamen 361021China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao UniversityXiamen 361021China School of Computer Science and Technology Harbin Institute of TechnologyShenzhen 518055China

Photomosaic images are composite images composed of many small images called *** its overall visual effect,a photomosaic image is similar to the target image,and photomosaics are also called“montage art”.Noisy blocks and the loss of local information are the major obstacles in most methods or programs that create photomosaic *** solve these problems and generate a photomosaic image in this study,we propose a tile selection method based on error minimization.A photomosaic image can be generated by partitioning the target image in a rectangular pattern,selecting appropriate tile images,and then adding them with a weight *** on the principles of montage art,the quality of the generated photomosaic image can be evaluated by both global and local *** the proposed framework,via an error function analysis,the results show that selecting a tile image using a global minimum distance minimizes both the global error and the local error ***,the weight coefficient of the image superposition can be used to adjust the ratio of the global and local ***,to verify the proposed method,we built a new photomosaic creation dataset during this *** experimental results show that the proposed method achieves a low mean absolute error and that the generated photomosaic images have a more artistic effect than do the existing approaches.

关键词： photomosaic image tile image target image error minimization mean absolute error

来源：评论

学校读者我要写书评

暂无评论

DegAE: A New Pretraining Paradigm for Low-Level vision

DegAE: A New Pretraining Paradigm for Low-Level Vision

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Yihao Liu Jingwen He Jinjin Gu Xiangtao Kong Yu Qiao Chao Dong Shanghai Artificial Intelligence Laboratory ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences The University of Sydney

Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL REPRESENTATION LEARNING 10

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL ...

引用

10th International Conference on Learning Representations, ICLR 2022

作者： Li, Kunchang Wang, Yali Gao, Peng Song, Guanglu Liu, Yu Li, Hongsheng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SenseTime Research The Chinese University of Hong Kong Hong Kong

It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in this research have been mainly driven by 3D convolutional neural networks and vision transformers. Although 3D convolution can efficiently aggregate local context to suppress local redundancy from a small 3D neighborhood, it lacks the capability to capture global dependency because of the limited receptive field. Alternatively, vision transformers can effectively capture long-range dependency by self-attention mechanism, while having the limitation on reducing local redundancy with blind similarity comparison among all the tokens in each layer. Based on these observations, we propose a novel Unified transFormer (UniFormer) which seamlessly integrates merits of 3D convolution and spatiotemporal self-attention in a concise transformer format, and achieves a preferable balance between computation and accuracy. Different from traditional transformers, our relation aggregator can tackle both spatiotemporal redundancy and dependency, by learning local and global token affinity respectively in shallow and deep layers. We conduct extensive experiments on the popular video benchmarks, e.g., Kinetics-400, Kinetics-600, and Something-Something V1&V2. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10× fewer GFLOPs than other state-of-the-art methods. For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60.9% and 71.2% top-1 accuracy respectively. Code is available at https://***/Sense-X/UniFormer. © 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.

关键词： Redundancy

来源：评论

学校读者我要写书评

暂无评论

Activating More Pixels in Image Super-Resolution Transformer

Activating More Pixels in Image Super-Resolution Transformer

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Xiangyu Chen Xintao Wang Jiantao Zhou Yu Qiao Chao Dong State Key Laboratory of Internet of Things for Smart City University of Macau Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory ARC Lab Tencent PCG

Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

arXiv

引用

arXiv 2024年

作者： Ding, Yanbo Zhuang, Shaobin Li, Kunchang Yue, Zhengrong Qiao, Yu Wang, Yali Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China School of Artificial Intelligence University of Chinese Academy of Sciences China Shanghai Artificial Intelligence Laboratory China Shanghai Jiao Tong University China

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in the 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES develops a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, existing benchmarks lack detailed descriptions of complex Copyright © 2024, The Authors. All rights reserved.

关键词： 3D modeling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：