检索结果-内蒙古大学图书馆

Proceedings of the 35th International Conference on Neural Information Processing Systems

作者： Liangbin Xie Xintao Wang Chao Dong Zhongang Qi Ying Shan Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and University of Chinese Academy of Sciences and ARC Lab Tencent PCG ARC Lab Tencent PCG Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and Shanghai AI Laboratory Shanghai China

ISBN: (纸本)9781713845393

Recent blind super-resolution (SR) methods typically consist of two branches, one for degradation prediction and the other for conditional restoration. However, our experiments show that a one-branch network can achieve comparable performance to the two-branch scheme. Then we wonder: how can one-branch networks automatically learn to distinguish degradations? To find the answer, we propose a new diagnostic tool – Filter Attribution method based on Integral Gradient (FAIG). Unlike previous integral gradient methods, our FAIG aims at finding the most discriminative filters instead of input pixels/features for degradation removal in blind SR networks. With the discovered filters, we further develop a simple yet effective method to predict the degradation of an input image. Based on FAIG, we show that, in one-branch blind SR networks, 1) we are able to find a very small number of (1%) discriminative filters for each specific degradation; 2) The weights, locations and connections of the discovered filters are all important to determine the specific network function. 3) The task of degradation prediction can be implicitly realized by these discriminative filters without explicit supervised learning. Our findings can not only help us better understand network behaviors inside one-branch blind SR networks, but also provide guidance on designing more efficient architectures and diagnosing networks for blind SR.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Special Issue on Face Presentation Attack Detection

IEEE Transactions on Biometrics, Behavior, and Identity Scie...

引用

IEEE Transactions on Biometrics, Behavior, and Identity Science 2021年第3期3卷 282-284页

作者： Wan, Jun Escalera, Sergio Escalante, Hugo Jair Guo, Guodong Li, Stan Z. National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing100190 China Computer Vision Center Universitat de Barcelona Barcelona08007 Spain Instituto Nacional de Astrofísica Óptica y Electrónica Puebla72840 Mexico Institute of Deep Learning Baidu Research Beijing100193 China Center for Ai Research and Innovation Westlake University Hangzhou310024 China

Face presentation attack detection, also termed Face Anti-Spoofing (FAS) [item 1), 2) in the Appendix), is a hot and challenging research topic that has received much attention from the computer vision and pattern recognition communities in the past. Owing to the development of deep learning and big data, recent advances in this and related fields has increased considerably. However, there are still several challenging tasks that deserve attention from the community, for instance robust techniques to unknown spoofing attacks, cross-domain generalization, and multi-modal fusion in images and video sequences. We edited this special issue with the goal of compiling the latest progress in the field and identifying promising research opportunities on FAS. © 2019 IEEE.

关键词： Special issues and sections Forgery Information integrity Face recognition Streaming media Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Matching individual Ladoga ringed seals across short-term image sequences (vol 102, pg 957, 2022)

引用

MAMMALIAN BIOLOGY 2022年第3期102卷 1045-1045页

作者： Nepovinnykh, E. Computer Vision and Pattern Recognition Laboratory Department of Computational Engineering School of Engineering Science Lappeenranta-Lahti University of Technology LUT P.O.Box 20 53851 Lappeenranta Finland Department of Artificial Intelligence Institute of Computer Science and Technology Peter the Great St. Petersburg Polytechnic University Polytechnicheskaya 29 Saint Petersburg Russian Federation 195251 Department of Computer Science and Computational Experiment Southern Federal University Rostov-on-Don Russian Federation 344006 Interregional Charitable Public Organization “Biologists for Nature Conservation” (BFNC) 24 line 3-7 Saint Petersburg Russian Federation 199106

Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are taken towards the automatic photo-identification of the Ladoga ringed seals (Pusa hispida ladogensis). A method is proposed that takes a sequence of images, each containing multiple individuals as the input, and produces cropped images of seals grouped based on one certain individual per group. The method starts by detecting each seal from the images and proceeds to matching the individual seals between the images. It is shown that high grouping accuracy can be obtained with a general-purpose image retrieval method on an image sequence taken from the same location within a relatively short period of time. Each resulting group contains multiple images of one individual with slightly different variations, for example, in pose and illumination. Utilizing these images simultaneously provides more information for the individual re-identification compared to the traditional approach, i.e., which utilizes just one image at a time. It is further demonstrated that a convolutional neural network based method can be used to extract the unique pelage patterns of the seals despite the low contrast. Finally, a method is proposed and experiments with the novel Ladoga ringed seals data are carried out to provide a proof-of-concept for the individual re-identification.

关键词： Animal re-identification Convolutional neural networks Instance segmentation Ladoga ringed seal Photo-identification

来源：评论

学校读者我要写书评

暂无评论

Enhancing Micro Gesture recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning

arXiv

引用

arXiv 2024年

作者： Li, Deng Xing, Bohao Liu, Xin School of Electrical and Information Engineering Tianjin University Tianjin300072 China Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT Lappeenranta53850 Finland

Psychological studies have shown that Micro Gestures (MG) are closely linked to human emotions. MG-based emotion understanding has attracted much attention because it allows for emotion understanding through nonverbal body gestures without relying on identity information (e.g., facial and electrocardiogram data). Therefore, it is essential to recognize MG effectively for advanced emotion understanding. However, existing Micro Gesture recognition (MGR) methods utilize only a single modality (e.g., RGB or skeleton) while overlooking crucial textual information. In this letter, we propose a simple but effective visual-text contrastive learning solution that utilizes text information for MGR. In addition, instead of using handcrafted prompts for visual-text contrastive learning, we propose a novel module called Adaptive prompting to generate context-aware prompts. The experimental results show that the proposed method achieves state-of-the-art performance on two public datasets. Furthermore, based on an empirical study utilizing the results of MGR for emotion understanding, we demonstrate that using the textual results of MGR significantly improves performance by 6%+ compared to directly using video as input. Copyright © 2024, The Authors. All rights reserved.

关键词： Gesture recognition

来源：评论

学校读者我要写书评

暂无评论

Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning

arXiv

引用

arXiv 2023年

作者： Cao, Cong Yue, Huanjing Liu, Xin Yang, Jingyu School of Electrical and Information Engineering Tianjin University China Computer Vision and Pattern Recognition Laboratory School of Engineering Science Lappeenranta-Lahti University of Technology LUT Lappeenranta Finland

Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone mapping algorithm is required to compress the dynamic range of HDR images (videos). Although image tone mapping has been widely explored, video tone mapping is lagging behind, especially for the deep-learning-based methods, due to the lack of HDR-LDR video pairs. In this work, we propose a unified framework (IVTMNet) for unsupervised image and video tone mapping. To improve unsupervised training, we propose domain and instance based contrastive learning loss. Instead of using a universal feature extractor, such as VGG to extract the features for similarity measurement, we propose a novel latent code, which is an aggregation of the brightness and contrast of extracted features, to measure the similarity of different pairs. We totally construct two negative pairs and three positive pairs to constrain the latent codes of tone mapped results. For the network structure, we propose a spatial-feature-enhanced (SFE) module to enable information exchange and transformation of nonlocal regions. For video tone mapping, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation and improve the temporal consistency of video tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset to facilitate the unsupervised training process for video tone mapping. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone mapping methods. Our code and dataset are available at https://***/caocong/UnCLTMO. Copyright © 2023, The Authors. All rights reserved.

关键词： Conformal mapping

来源：评论

学校读者我要写书评

暂无评论

Digging into Uncertainty in Self-supervised Multi-view Stereo

Digging into Uncertainty in Self-supervised Multi-view Stere...

引用

International Conference on computer vision (ICCV)

作者： Hongbin Xu Zhipeng Zhou Yali Wang Wenxiong Kang Baigui Sun Hao Li Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences South China University of Technology Alibaba Group Pazhou Laboratory Shanghai AI Laboratory

ISBN: (纸本)9781665428132

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (U-MVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework 1 achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.

关键词： Optical losses Optical filters computer vision Uncertainty Three-dimensional displays Monte Carlo methods Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

Towards accurate scene text recognition with semantic reasoning networks

arXiv

引用

arXiv 2020年

作者： Yu, Deli Li, Xuan Zhang, Chengquan Liu, Tao Han, Junyu Liu, Jingtuo Ding, Errui School of Artificial Intelligence University of Chinese Academy of Sciences Department of Computer Vision Technology Baidu Inc National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences

Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use. Copyright © 2020, The Authors. All rights reserved.

关键词： Semantic Web

来源：评论

学校读者我要写书评

暂无评论

Digging into uncertainty in self-supervised multi-view stereo

arXiv

引用

arXiv 2021年

作者： Xu, Hongbin Zhou, Zhipeng Wang, Yali Kang, Wenxiong Sun, Baigui Li, Hao Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences South China University of Technology Shanghai AI Laboratory Alibaba Group Pazhou Laboratory

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents. © 2021, CC BY.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Robust recovery of eigenimages in the presence of outliers and occlusions

引用

Journal of Computing and Information Technology 1996年第1期4卷 25-38页

作者： Leonardis, Aleš Bischof, Horst Department for Pattern Recognition and Image Processing Technical University Vienna Treitlstraße 3/1832 ViennaA-1040 Austria Computer Vision Laboratory Faculty of Computer and Information Science University of Ljubljana Tržaška 25 Ljubljana1000 Slovenia

The basic limitations of the current appearance-based matching methods using eigenimages are non-robust estimation of coefficients and inability to cope with problems related to occlusions and segmentation. In this paper we present a new approach which successfully solves these problems. The major novelty of our approach lies in the way how the coefficients of the eigenimages are determined. Instead of computing the coefficients by a projection of the data onto the eigenimages. we extract them by a hvpothesize-and-test paradigm using subsets of image points. Competing hypotheses arc then subject to a selection procedure based on the Minimum Description Length principle. The approach enables us not only lo reject outliers and to deal with occlusions but also to simultaneously use multiple classes of eigenimages.

关键词： Principal component analysis

来源：评论

学校读者我要写书评

暂无评论

Locating high-density clusters with noisy queries

Locating high-density clusters with noisy queries

引用

International Conference on pattern recognition

作者： Chen Cao Shifeng Chen Changqing Zou Jianzhuang Liu Shenzhen Key Laboratory for Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China Department of Information Engineering Chinese University of Hong Kong China

ISBN: (纸本)9781467322164

Semi-supervised learning (SSL) relies on a few labeled samples to explore data's intrinsic structure through pairwise smooth transduction. The performance of SSL mainly depends on two folds: (1) the accuracy of labeled queries, (2) the integrity of manifolds in data distribution. Both of these qualities would be poor in real applications as data often consist of several irrelevant clusters and discrete noise. In this paper we propose a novel framework to simultaneously remove discrete noise and locate the high-density clusters. Experiments demonstrate that our algorithm is quite effective to solve several problems such as non-feedback image re-ranking and image co-segmentation.

关键词： Noise Noise measurement Clustering algorithms Databases Vectors Manifolds Semisupervised learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：