检索结果-内蒙古大学图书馆

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Qin, Jin Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training. Copyright © 2023, The Authors. All rights reserved.

关键词： Textures

来源：评论

学校读者我要写书评

暂无评论

Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement

arXiv

引用

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Learning-based methods have attracted a lot of research attention and led to significant improvements in low-light image enhancement. However, most of them still suffer from two main problems: expensive computational cost in high resolution images and unsatisfactory performance in simultaneous enhancement and denoising. To address these problems, we propose BDCE, a bootstrap diffusion model that exploits the learning of the distribution of the curve parameters instead of the normal-light image itself. Specifically, we adopt the curve estimation method to handle the high-resolution images, where the curve parameters are estimated by our bootstrap diffusion model. In addition, a denoise module is applied in each iteration of curve adjustment to denoise the intermediate enhanced result of each iteration. We evaluate BDCE on commonly used benchmark datasets, and extensive experiments show that it achieves state-of-the-art qualitative and quantitative performance. Copyright © 2023, The Authors. All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Dynamic Feature Queue for Surveillance Face Anti-spoofing via Progressive Training

Dynamic Feature Queue for Surveillance Face Anti-spoofing vi...

引用

IEEE computer Society Conference on computer vision and pattern recognition Workshops (CVPRW)

作者： keyao Wang Mouxiao Huang Guosheng Zhang Haixiao Yue Gang Zhang Yu Qiao Department of Computer Vision Technology (VIS) Baidu Inc. ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences

In recent years, face recognition systems have faced increasingly security threats, making it essential to employ Face Anti-spoofing (FAS) to protect against various types of attacks in traditional scenarios like phone unlocking, face payment and self-service security inspection. However, further exploration is required to fully secure FAS in long-distance settings. In this paper, we propose two contributions to enhance the security of face recognition systems: Dynamic Feature Queue (DFQ) and Progressive Training Strategy (PTS). DFQ converts the conventional binary classification task into a multi-classification task. It treats live samples as a closed set and attack samples as an open set by using a dynamic queue that stores the features of spoofing samples and updates them. On the other hand, PTS targets difficult samples and iteratively adds them in batches for training. The proposed PTS divides the entire training set into blocks, trains only a small portion of the data, and gradually increases the training data with each stage while also incorporating low-scoring positive samples and high-scoring spoof samples from the test set. These two contributions complement each other by enhancing the model’s ability to generalize and defend against various types of attacks, making the face recognition system more secure and reliable. Our proposed methods have achieved top performance on ACER metric with 4.73% on the SuHiFiMask dataset [11] and won the first prize in Surveillance Face Anti-spoofing track of the Challenge@CVPR 2023.

关键词：

来源：评论

学校读者我要写书评

暂无评论

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL REPRESENTATION LEARNING 10

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL ...

引用

10th International Conference on Learning Representations, ICLR 2022

作者： Li, Kunchang Wang, Yali Gao, Peng Song, Guanglu Liu, Yu Li, Hongsheng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SenseTime Research The Chinese University of Hong Kong Hong Kong

It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in this research have been mainly driven by 3D convolutional neural networks and vision transformers. Although 3D convolution can efficiently aggregate local context to suppress local redundancy from a small 3D neighborhood, it lacks the capability to capture global dependency because of the limited receptive field. Alternatively, vision transformers can effectively capture long-range dependency by self-attention mechanism, while having the limitation on reducing local redundancy with blind similarity comparison among all the tokens in each layer. Based on these observations, we propose a novel Unified transFormer (UniFormer) which seamlessly integrates merits of 3D convolution and spatiotemporal self-attention in a concise transformer format, and achieves a preferable balance between computation and accuracy. Different from traditional transformers, our relation aggregator can tackle both spatiotemporal redundancy and dependency, by learning local and global token affinity respectively in shallow and deep layers. We conduct extensive experiments on the popular video benchmarks, e.g., Kinetics-400, Kinetics-600, and Something-Something V1&V2. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10× fewer GFLOPs than other state-of-the-art methods. For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60.9% and 71.2% top-1 accuracy respectively. Code is available at https://***/Sense-X/UniFormer. © 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.

关键词： Redundancy

来源：评论

学校读者我要写书评

暂无评论

Tile selection method based on error minimization for photomosaic image creation

引用

Frontiers of computer Science 2021年第3期15卷 165-172页

作者： Hongbo ZHANG Xin GAO Jixiang DU Qing LEI Lijie YANG Department of Computer Science and Technology Huaqiao UniversityXiamen 361021China Fujian Key Laboratory of Big Data Intelligence and Security Huaqiao UniversityXiamen 361021China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao UniversityXiamen 361021China School of Computer Science and Technology Harbin Institute of TechnologyShenzhen 518055China

Photomosaic images are composite images composed of many small images called *** its overall visual effect,a photomosaic image is similar to the target image,and photomosaics are also called“montage art”.Noisy blocks and the loss of local information are the major obstacles in most methods or programs that create photomosaic *** solve these problems and generate a photomosaic image in this study,we propose a tile selection method based on error minimization.A photomosaic image can be generated by partitioning the target image in a rectangular pattern,selecting appropriate tile images,and then adding them with a weight *** on the principles of montage art,the quality of the generated photomosaic image can be evaluated by both global and local *** the proposed framework,via an error function analysis,the results show that selecting a tile image using a global minimum distance minimizes both the global error and the local error ***,the weight coefficient of the image superposition can be used to adjust the ratio of the global and local ***,to verify the proposed method,we built a new photomosaic creation dataset during this *** experimental results show that the proposed method achieves a low mean absolute error and that the generated photomosaic images have a more artistic effect than do the existing approaches.

关键词： photomosaic image tile image target image error minimization mean absolute error

来源：评论

学校读者我要写书评

暂无评论

DegAE: A New Pretraining Paradigm for Low-Level vision

DegAE: A New Pretraining Paradigm for Low-Level Vision

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Yihao Liu Jingwen He Jinjin Gu Xiangtao Kong Yu Qiao Chao Dong Shanghai Artificial Intelligence Laboratory ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences The University of Sydney

Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

CodePhys: Robust Video-Based Remote Physiological Measurement Through Latent Codebook Querying

引用

IEEE Journal of Biomedical and Health Informatics 2025年 PP卷 PP页

作者： Chu, Shuyang Xia, Menghan Yuan, Mengyao Liu, Xin Seppanen, Tapio Zhao, Guoying Shi, Jingang Xi'an Jiaotong University School of Software Engineering Xi'an China Tencent Ai Lab Shenzhen China Lappeenranta-Lahti University of Technology Lut Computer Vision and Pattern Recognition Laboratory Lappeenranta53850 Finland University of Oulu Center for Machine Vision and Signal Analysis Finland

Remote photoplethysmography (rPPG) aims to measure non-contact physiological signals from facial videos, which has shown great potential in many applications. Most existing methods directly extract video-based rPPG features by designing neural networks for heart rate estimation. Although they can achieve acceptable results, the recovery of rPPG signal faces intractable challenges when interference from real-world scenarios takes place on facial video. Specifically, facial videos are inevitably affected by non-physiological factors (e.g., camera device noise, defocus, and motion blur), leading to the distortion of extracted rPPG signals. Recent rPPG extraction methods are easily affected by interference and degradation, resulting in noisy rPPG signals. In this paper, we propose a novel method named CodePhys, which innovatively treats rPPG measurement as a code query task in a noise-free proxy space (i.e., codebook) constructed by ground-truth PPG signals. We consider noisy rPPG features as queries and generate high-fidelity rPPG features by matching them with noise-free PPG features from the codebook. Our approach also incorporates a spatial-aware encoder network with a spatial attention mechanism to highlight physiologically active areas and uses a distillation loss to reduce the influence of non-periodic visual interference. Experimental results on four benchmark datasets demonstrate that CodePhys outperforms state-of-the-art methods in both intra-dataset and cross-dataset settings. © 2025 IEEE.

关键词： Heart

来源：评论

学校读者我要写书评

暂无评论

Activating More Pixels in Image Super-Resolution Transformer

Activating More Pixels in Image Super-Resolution Transformer

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Xiangyu Chen Xintao Wang Jiantao Zhou Yu Qiao Chao Dong State Key Laboratory of Internet of Things for Smart City University of Macau Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory ARC Lab Tencent PCG

Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.

关键词：

来源：评论

学校读者我要写书评

暂无评论

WaveDM: Wavelet-Based Diffusion Models for Image Restoration

arXiv

引用

arXiv 2023年

作者： Huang, Yi Huang, Jiancheng Liu, Jianzhuang Yan, Mingfu Dong, Yu Lyu, Jiaxi Chen, Chaoqi Chen, Shifeng Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen518055 China University of Chinese Academy of Sciences Beijing100039 China The University of Hong Kong Hong Kong

Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100× faster than existing image restoration methods using vanilla diffusion models. The code is available at https://***/stayalive16/WaveDM. Copyright © 2023, The Authors. All rights reserved.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

arXiv

引用

arXiv 2024年

作者： Ding, Yanbo Zhuang, Shaobin Li, Kunchang Yue, Zhengrong Qiao, Yu Wang, Yali Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China School of Artificial Intelligence University of Chinese Academy of Sciences China Shanghai Artificial Intelligence Laboratory China Shanghai Jiao Tong University China

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in the 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES develops a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, existing benchmarks lack detailed descriptions of complex Copyright © 2024, The Authors. All rights reserved.

关键词： 3D modeling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：