检索结果-内蒙古大学图书馆

Interactive Multi-dimension Modulation with Dynamic Controllab.e Residual Learning for Image Restoration 1

学校读者我要写书评

暂无评论

16th European Conference on computer vision, ECCV 2020

作者： He, Jingwen Dong, Chao Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Beijing China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China

ISBN: (数字)9783030585655

ISBN: (纸本)9783030585648

Interactive image restoration aims to generate restored images by adjusting a controlling coefficient which determines the restoration level. Previous works are restricted in modulating image with a single coefficient. However, real images always contain multiple types of degradation, which cannot be well determined by one coefficient. To make a step forward, this paper presents a new problem setup, called multi-dimension (MD) modulation, which aims at modulating output effects across multiple degradation types and levels. Compared with the previous single-dimension (SD) modulation, the MD is setup to handle multiple degradations adaptively and relief unbalanced learning problem in different degradations. We also propose a deep architecture - CResMD with newly introduced controllab.e residual connections for multi-dimension modulation. Specifically, we add a controlling variable on the conventional residual connection to allow a weighted summation of input and residual. The values of these weights are generated by another condition network. We further propose a new data sampling strategy based on beta distribution to balance different degradation types and levels. With corrupted image and degradation information as inputs, the network can output the corresponding restored image. By tweaking the condition vector, users can control the output effects in MD space at test time. Extensive experiments demonstrate that the proposed CResMD achieve excellent performance on both SD and MD modulation tasks. Code is availab.e at https://***/hejingwenhejingwen/CResMD. © 2020, Springer Nature Switzerland AG.

关键词： Modulation

Dual-AI: Dual-path Actor Interaction Learning for Group Activity recognition

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Han, Mingfei Zhang, David Junhao Wang, Yali Yan, Rui Yao, Lina Chang, Xiaojun Qiao, Yu ReLER AAII UTS United States National University of Singapore Singapore ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China RMIT University Australia University of New South Wales Australia Shanghai AI Laboratory Shanghai China

Learning spatial-temporal relation among multiple actors is crucial for group activity recognition. Different group activities often show the diversified interactions between actors in the video. Hence, it is often difficult to model complex group activities from a single view of spatial-temporal actor evolution. To tackle this problem, we propose a distinct Dual-path Actor Interaction (Dual-AI) framework, which flexibly arranges spatial and temporal transformers in two complementary orders, enhancing actor relations by integrating merits from different spatiotemporal paths. Moreover, we introduce a novel Multi-scale Actor Contrastive Loss (MAC-Loss) between two interactive paths of Dual-AI. Via self-supervised actor consistency in both frame and video levels, MAC-Loss can effectively distinguish individual actor representations to reduce action confusion among different actors. Consequently, our Dual-AI can boost group activity recognition by fusing such discriminative features of different actors. To evaluate the proposed approach, we conduct extensive experiments on the widely used benchmarks, including Volleyball [24], Collective Activity [12], and NBA datasets [56]. The proposed Dual-AI achieves state-of-the-art performance on all these datasets. It is worth noting the proposed Dual-AI with 50% training data outperforms a number of recent approaches with 100% training data. This confirms the generalization power of Dual-AI for group activity recognition, even under the challenging scenarios of limited supervision. Copyright © 2022, The Authors. All rights reserved.

关键词： Machine learning

Learning dynamical human-joint affinity for 3D pose estimation in videos

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Zhang, Junhao Wang, Yali Zhou, Zhipeng Luan, Tianyu Wang, Zhe Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of California Irvine United States Shanghai AI Laboratory Shanghai China

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

Visual Compositional Learning for Human-Object Interaction Detection 16th

学校读者我要写书评

暂无评论

Visual Compositional Learning for Human-Object Interaction D...

16th European Conference on computer vision, ECCV 2020

作者： Hou, Zhi Peng, Xiaojiang Qiao, Yu Tao, Dacheng UBTECH Sydney AI Centre School of Computer Science Faculty of Engineering The University of Sydney DarlingtonNSW2008 Australia Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Beijing China

ISBN: (纸本)9783030585549

Human-Object interaction (HOI) detection aims to localize and infer relationships between human and objects in an image. It is challenging because an enormous number of possible combinations of objects and verbs types forms a long-tail distribution. We devise a deep Visual Compositional Learning (VCL) framework, which is a simple yet efficient framework to effectively address this problem. VCL first decomposes an HOI representation into object and verb specific features, and then composes new interaction samples in the feature space via stitching the decomposed features. The integration of decomposition and composition enables VCL to share object and verb features among different HOI samples and images, and to generate new interaction samples and new types of HOI, and thus largely alleviates the long-tail distribution problem and benefits low-shot or zero-shot HOI detection. Extensive experiments demonstrate that the proposed VCL can effectively improve the generalization of HOI detection on HICO-DET and V-COCO and outperforms the recent state-of-the-art methods on HICO-DET. Code is availab.e at https://***/zhihou7/VCL. © 2020, Springer Nature Switzerland AG.

关键词： Object detection

Attention-Driven Dynamic Graph Convolutional Network for Multi-lab.l Image recognition 16th

学校读者我要写书评

暂无评论

Attention-Driven Dynamic Graph Convolutional Network for Mul...

16th European Conference on computer vision, ECCV 2020

作者： Ye, Jin He, Junjun Peng, Xiaojiang Wu, Wenhao Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China School of Biomedical Engineering the Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China

ISBN: (纸本)9783030585884

Recent studies often exploit Graph Convolutional Network (GCN) to model lab.l dependencies to improve recognition accuracy for multi-lab.l image recognition. However, constructing a graph by counting the lab.l co-occurrence possibilities of the training data may degrade model generalizability, especially when there exist occasional co-occurrence objects in test images. Our goal is to eliminate such bias and enhance the robustness of the learnt features. To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image. ADD-GCN adopts a Dynamic Graph Convolutional Network (D-GCN) to model the relation of content-aware category representations that are generated by a Semantic Attention Module (SAM). Extensive experiments on public multi-lab.l benchmarks demonstrate the effectiveness of our method, which achieves mAPs of 85.2%, 96.0%, and 95.5% on MS-COCO, VOC2007, and VOC2012, respectively, and outperforms current state-of-the-art methods with a clear margin. © 2020, Springer Nature Switzerland AG.

关键词： Semantics

Enhanced Quadratic Video Interpolation 16th

学校读者我要写书评

暂无评论

Enhanced Quadratic Video Interpolation

Workshops held at the 16th European Conference on computer vision, ECCV 2020

作者： Liu, Yihao Xie, Liangbin Siyao, Li Sun, Wenxiu Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China SenseTime Research Beijing China

ISBN: (纸本)9783030668228

With the prosperity of digital video industry, video frame interpolation has arisen continuous attention in computer vision community and become a new upsurge in industry. Many learning-based methods have been proposed and achieved progressive results. Among them, a recent algorithm named quadratic video interpolation (QVI) achieves appealing performance. It exploits higher-order motion information (e.g. acceleration) and successfully models the estimation of interpolated flow. However, its produced intermediate frames still contain some unsatisfactory ghosting, artifacts and inaccurate motion, especially when large and complex motion occurs. In this work, we further improve the performance of QVI from three facets and propose an enhanced quadratic video interpolation (EQVI) model. In particular, we adopt a rectified quadratic flow prediction (RQFP) formulation with least squares method to estimate the motion more accurately. Complementary with image pixel-level blending, we introduce a residual contextual synthesis network (RCSN) to employ contextual information in high-dimensional feature space, which could help the model handle more complicated scenes and motion patterns. Moreover, to further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process. The proposed EQVI model won the first place in the AIM2020 Video Temporal Super-Resolution Challenge. Codes are availab.e at https://***/lyh-18/EQVI. © 2020, Springer Nature Switzerland AG.

关键词： computer graphics

Digging into Uncertainty in Self-supervised Multi-view Stereo

学校读者我要写书评

暂无评论

Digging into Uncertainty in Self-supervised Multi-view Stere...

International Conference on computer vision (ICCV)

作者： Hongbin Xu Zhipeng Zhou Yali Wang Wenxiong Kang Baigui Sun Hao Li Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences South China University of Technology Alibaba Group Pazhou Laboratory Shanghai AI Laboratory

ISBN: (纸本)9781665428132

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (U-MVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework 1 achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.

关键词： Optical losses Optical filters computer vision Uncertainty Three-dimensional displays Monte Carlo methods Benchmark testing

Self-supervised multi-view stereo via effective co-segmentation and data-augmentation

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Xu, Hongbin Zhou, Zhipeng Qiao, Yu Kang, Wenxiong Wu, Qiuxia ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China Shanghai AI Lab Shanghai China South China University of Technology Guangzhou China

Recent studies have witnessed that self-supervised methods based on view synthesis obtain clear progress on multiview stereo (MVS). However, existing methods rely on the assumption that the corresponding points among different views share the same color, which may not always be true in practice. This may lead to unreliable self-supervised signal and harm the final reconstruction performance. To address the issue, we propose a framework integrated with more reliable supervision guided by semantic co-segmentation and data-augmentation. Specially, we excavate mutual semantic from multi-view images to guide the semantic consistency. And we devise effective data-augmentation mechanism which ensures the transformation robustness by treating the prediction of regular samples as pseudo ground truth to regularize the prediction of augmented samples. Experimental results on DTU dataset show that our proposed methods achieve the state-of-the-art performance among unsupervised methods, and even compete on par with supervised methods. Furthermore, extensive experiments on Tanks&Temples dataset demonstrate the effective generalization ability of the proposed method. Copyright © 2021, The Authors. All rights reserved.

关键词： Semantics

Digging into uncertainty in self-supervised multi-view stereo

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Xu, Hongbin Zhou, Zhipeng Wang, Yali Kang, Wenxiong Sun, Baigui Li, Hao Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences South China University of Technology Shanghai AI Laboratory Alibaba Group Pazhou Laboratory

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents. © 2021, CC BY.

关键词： Image reconstruction