检索结果-内蒙古大学图书馆

arXiv 2020年

作者： Chen, Wanli Zhu, Xinge Sun, Ruoqi He, Junjun Li, Ruiyu Shen, Xiaoyong Yu, Bei Chinese University of Hong Kong Hong Kong Shanghai Jiao Tong University China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China SmartMore United States

Context information plays an indispensable role in the success of semantic segmentation. Recently, non-local self-attention based methods are proved to be effective for context information collection. Since the desired context consists of spatial-wise and channel-wise attentions, 3D representation is an appropriate formulation. However, these non-local methods describe 3D context information based on a 2D similarity matrix, where space compression may lead to channel-wise attention missing. An alternative is to model the contextual information directly without compression. However, this effort confronts a fundamental difficulty, namely the high-rank property of context information. In this paper, we propose a new approach to model the 3D context representations, which not only avoids the space compression but also tackles the high-rank difficulty. Here, inspired by tensor canonical-polyadic decomposition theory (i.e, a high-rank tensor can be expressed as a combination of rank-1 tensors.), we design a low-rank-to-high-rank context reconstruction framework (i.e, RecoNet). Specifically, we first introduce the tensor generation module (TGM), which generates a number of rank-1 tensors to capture fragments of context feature. Then we use these rank-1 tensors to recover the high-rank context features through our proposed tensor reconstruction module (TRM). Extensive experiments show that our method achieves state-of-the-art on various public datasets. Additionally, our proposed method has more than 100 times less computational cost compared with conventional non-local-based methods. Copyright © 2020, The Authors. All rights reserved.

关键词： Tensors

来源：评论

学校读者我要写书评

暂无评论

Low-Resolution Action recognition for Tiny Actions Challenge

arXiv

引用

arXiv 2022年

作者： Chen, Boyu Qiao, Yu Wang, Yali ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Tiny Actions Challenge focuses on understanding human activities in real-world surveillance. Basically, there are two main difficulties for activity recognition in this scenario. First, human activities are often recorded at a distance, and appear in a small resolution without much discriminative clue. Second, these activities are naturally distributed in a long-tailed way. It is hard to alleviate data bias for such heavy category imbalance. To tackle these problems, we propose a comprehensive recognition solution in this paper. First, we train video backbones with data balance, in order to alleviate overfitting in the challenge benchmark. Second, we design a dual-resolution distillation framework, which can effectively guide low-resolution action recognition by super-resolution knowledge. Finally, we apply model ensemble with post-processing, which can further boost performance on the long-tailed categories. Our solution ranks Top-1 on the leaderboard. Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

Reflash Dropout in Image Super-Resolution

arXiv

引用

arXiv 2021年

作者： Kong, Xiangtao Liu, Xina Gu, Jinjin Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab. Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China The University of Sydney Australia Shanghai AI Laboratory Shanghai China

Dropout is designed to relieve the overfitting problem in high-level vision tasks but is rarely applied in low-level vision tasks, like image super-resolution (SR). As a classic regression problem, SR exhibits a different behaviour as high-level tasks and is sensitive to the dropout operation. However, in this paper, we show that appropriate usage of dropout benefits SR networks and improves the generalization ability. Specifically, dropout is better embedded at the end of the network and is significantly helpful for the multi-degradation settings. This discovery breaks our common sense and inspires us to explore its working mechanism. We further use two analysis tools – one is from a recent network interpretation work, and the other is specially designed for this task. The analysis results provide side proofs to our experimental findings and show us a new perspective to understand SR networks. Copyright © 2021, The Authors. All rights reserved.

关键词： Optical resolving power

来源：评论

学校读者我要写书评

暂无评论

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

arXiv

引用

arXiv 2020年

作者： Gu, Jinjin Cai, Haoming Chen, Haoyu Ye, Xiaoxing Ren, Jimmy S. Dong, Chao School of Data Science Chinese University of Hong Kong Shenzhen China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China SenseTime Research SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method. Copyright © 2020, The Authors. All rights reserved.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

UDC-UNet: Under-Display Camera Image Restoration via U-shape Dynamic Network

arXiv

引用

arXiv 2022年

作者： Liu, Xina Hu, Jinfan Chen, Xiangyu Dong, Chao Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shanghai China University of Chinese Academy of Sciences Shanghai China University of Macau Shanghai China Shanghai AI Laboratory Shanghai China

Under-Display Camera (UDC) has been widely exploited to help smartphones realize full-screen displays. However, as the screen could inevitably affect the light propagation process, the images captured by the UDC system usually contain flare, haze, blur, and noise. Particularly, flare and blur in UDC images could severely deteriorate the user experience in high dynamic range (HDR) scenes. In this paper, we propose a new deep model, namely UDC-UNet, to address the UDC image restoration problem with an estimated PSF in HDR scenes. Our network consists of three parts, including a U-shape base network to utilize multi-scale information, a condition branch to perform spatially variant modulation, and a kernel branch to leverage the prior knowledge of the PSF. According to the characteristics of HDR data, we additionally design a tone mapping loss to stabilize network optimization and achieve better visual quality. Experimental results show that the proposed UDC-UNet outperforms the state-of-the-art methods in quantitative and qualitative comparisons. Our approach won second place in the UDC image restoration track of the MIPI challenge. Codes and models are available at https://***/J-FHu/UDCUNet. © 2022, CC BY.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

arXiv

引用

arXiv 2021年

作者： Zhang, David Junhao Li, Kunchang Wang, Yali Chen, Yunpeng Chandra, Shashwat Qiao, Yu Liu, Luoqi Shou, Mike Zheng National University of Singapore Singapore Meitu Inc China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory China

Recently, MLP-Like networks have been revived for image recognition. However, whether it is possible to build a generic MLP-Like architecture on video domain has not been explored, due to complex spatial-temporal modeling with large computation burden. To fill this gap, we present an efficient self-attention free backbone, namely MorphMLP, which flexibly leverages the concise Fully-Connected (FC) layer for video representation learning. Specifically, a MorphMLP block consists of two key layers in sequence, i.e., MorphFCs and MorphFCt, for spatial and temporal modeling respectively. MorphFCs can effectively capture core semantics in each frame, by progressive token interaction along both height and width dimensions. Alternatively, MorphFCt can adaptively learn long-term dependency over frames, by temporal token aggregation on each spatial location. With such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance. Finally, we evaluate our MorphMLP on a number of popular video benchmarks. Compared with the recent state-of-the-art models, MorphMLP significantly reduces computation but with better accuracy, e.g., MorphMLP-S only uses 50% GFLOPs of VideoSwin-T but achieves 0.9% top-1 improvement on Kinetics400, under ImageNet1K pretraining. MorphMLP-B only uses 43% GFLOPs of MViT-B but achieves 2.4% top-1 improvement on SSV2, even though MorphMLP-B is pretrained on ImageNet1K while MViT-B is pretrained on Kinetics400. Moreover, our method adapted to the image domain outperforms previous SOTA MLP-Like architectures. Code is available at https://***/MTlab/MorphMLP. Copyright © 2021, The Authors. All rights reserved.

关键词： Image recognition

来源：评论

学校读者我要写书评

暂无评论

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

arXiv

引用

arXiv 2024年

作者： Ding, Yanbo Zhuang, Shaobin Li, Kunchang Yue, Zhengrong Qiao, Yu Wang, Yali Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China School of Artificial Intelligence University of Chinese Academy of Sciences China Shanghai Artificial Intelligence Laboratory China Shanghai Jiao Tong University China

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in the 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES develops a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, existing benchmarks lack detailed descriptions of complex Copyright © 2024, The Authors. All rights reserved.

关键词： 3D modeling

来源：评论

学校读者我要写书评

暂无评论

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models

arXiv

引用

arXiv 2023年

作者： Xie, Liangbin Wang, Xintao Chen, Xiangyu Li, Gen Shan, Ying Zhou, Jiantao Dong, Chao State Key Laboratory of Internet of Things for Smart City University of Macau China Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China ARC Lab Tencent PCG China Shanghai Artificial Intelligence Laboratory China Platform Technologies China

Image super-resolution (SR) with generative adversarial networks (GAN) has achieved great success in restoring realistic details. However, it is notorious that GAN-based SR models will inevitably produce unpleasant and undesirable artifacts, especially in practical scenarios. Previous works typically suppress artifacts with an extra loss penalty in the training phase. They only work for in-distribution artifact types generated during training. When applied in real-world scenarios, we observe that those improved methods still generate obviously annoying artifacts during inference. In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths. We then develop a novel method, namely, DeSRA, to Detect and then "Delete" those SR Artifacts in practice. Specifically, we propose to measure a relative local variance distance from MSE-SR results and GAN-SR results, and locate the problematic areas based on the above distance and semantic-aware thresholds. After detecting the artifact regions, we develop a finetune procedure to improve GAN-based SR models with a few samples, so that they can deal with similar types of artifacts in more unseen real data. Equipped with our DeSRA, we can successfully eliminate artifacts from inference and improve the ability of SR models to be applied in real-world scenarios. The code will be available at https://***/TencentARC/DeSRA. Copyright © 2023, The Authors. All rights reserved.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Masked Image Training for Generalizable Deep Image Denoising

Masked Image Training for Generalizable Deep Image Denoising

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Haoyu Chen Jinjin Gu Yihao Liu Salma Abdel Magid Chao Dong Qiong Wang Hanspeter Pfister Lei Zhu The Hong Kong University of Science and Technology (Guangzhou) Shanghai AI Lab The University of Sydney ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences Harvard University Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology Shenzhen Institute of Advanced Technology Chinese Academy of Sciences The Hong Kong University of Science and Technology

When capturing and storing images, devices inevitably introduce noise. Reducing this noise is a critical task called image denoising. Deep learning has become the de facto method for image denoising, especially with the emergence of Transformer-based models that have achieved notable state-of-the-art results on various image tasks. However, deep learning-based methods often suffer from a lack of generalization ability. For example, deep models trained on Gaussian noise may perform poorly when tested on other noise distributions. To address this issue, we present a novel approach to enhance the generalization performance of denoising networks, known as masked training. Our method involves masking random pixels of the input image and reconstructing the missing information during training. We also mask out the features in the self-attention layers to avoid the impact of training-testing inconsistency. Our approach exhibits better generalization ability than other deep learning models and is directly applicable to real-world scenarios. Additionally, our interpretability analysis demonstrates the superiority of our method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

CRNN based jersey-bib number/text recognition in sports and marathon images 15

CRNN based jersey-bib number/text recognition in sports and ...

引用

15th IAPR International Conference on Document Analysis and recognition, ICDAR 2019

作者： Nag, Sauradip Ramachandra, Raghavendra Shivakumara, Palaiahnakote Pal, Umapada Lu, Tong Kankanhalli, Mohan Department of Computer Science & Engineering Kalyani Government Engineering College Kalyani India Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology Norway Faculty of Computer System and Information Technology University of Malaya Malaysia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University China Department of Computer Science School of Computing National University of Singapore Singapore Singapore

ISBN: (纸本)9781728128610

The primary challenge in tracing the participants in sports and marathon video or images is to detect and localize the jersey/Bib number that may present in different regions of their outfit captured in cluttered environment conditions. In this work, we proposed a new framework based on detecting the human body parts such that both Jersey Bib number and text is localized reliably. To achieve this, the proposed method first detects and localize the human in a given image using Single Shot Multibox Detector (SSD). In the next step, different human body parts namely, Torso, Left Thigh, Right Thigh, that generally contain a Bib number or text region is automatically extracted. These detected individual parts are processed individually to detect the Jersey Bib number/text using a deep CNN network based on the 2-channel architecture based on the novel adaptive weighting loss function. Finally, the detected text is cropped out and fed to a CNN-RNN based deep model abbreviated as CRNN for recognizing jersey/Bib/text. Extensive experiments are carried out on the four different datasets including both bench-marking dataset and a new dataset. The performance of the proposed method is compared with the state-of-the-art methods on all four datasets that indicates the improved performance of the proposed method on all four datasets. © 2019 IEEE.

关键词： Sports

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：