检索结果-内蒙古大学图书馆

TransDocUNet: A Transformer-based UNet Architecture for Degraded Document Image Binarization 23

学校读者我要写书评

暂无评论

TransDocUNet: A Transformer-based UNet Architecture for Degr...

Proceedings of the Fourteenth Indian Conference on computer vision, Graphics and Image Processing

作者： Risab Biswas Soumik Sarkhel Swalpa Kumar Roy Umapada Pal Artificial Intelligence Group Optiks Innovations Pvt. Ltd. (P360) India Department of Computer Science and Engineering Alipurduar Government Engineering and Management College India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (纸本)9798400716256

The enhancement of historical document images is critical for improving the quality and legibility of scanned or captured document images. Convolutional-based techniques previously generated competitive results for document image binarization, however, due to their inherent locality, these models are often limited in explicitly expressing long-range dependency. Transformers (ViT) have evolved as an alternative design with a global self-attention mechanism to tackle this issue, however, they can result in restricted localization capabilities due to a lack of low-level details. To address this problem, we propose TransDocUNet, a CNN-Transformer hybrid UNet architecture for document image binarization that merits both attention and convolution capabilities in a U-Net architecture and serves as a strong alternative to the existing solutions. The experimental results, obtained using the DIBCO/H-DIBCO datasets, highlight that our proposed method outperforms all the existing competing methods in terms of both objective quality metrics and visual quality assessment, achieving state-of-the-art performance in document image binarization. In addition, we undertake an ablation study to understand the role of dilation in the CNN to capture feature dependencies while reducing the computational cost as well. The findings helped us arrive at the final model and provide valuable insights into the importance of acquiring both global and local contextual information for tasks like enhancing document images.

关键词： Binarization CNN Deep Learning Document Image Enhancement TransUNet UNet vision Transformer

Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gao, Rong Liu, Xin Xing, Bohao Yu, Zitong Schuller, Bjorn W. Kälviäinen, Heikki Computer Vision and Pattern Recognition Laboratory School of Engineering Sciences Lappeenranta-Lahti University of Technology LUT Finland School of Computing and Information Technology Great Bay University China Group on Language Audio & Music Imperial College London United Kingdom School of Medicine and Health Technical University of Munich Germany

In this work, we focus on a special group of human body language — the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on microgestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence. Our code is available at: https://***/ErichG/MG based Emotion *** Codes 68T01 Copyright © 2024, The Authors. All rights reserved.

关键词： Contrastive Learning

A Survey of Historical Document Image Datasets

学校读者我要写书评

暂无评论

arXiv 2022年

关键词： Deterioration

Digging into Uncertainty in Self-supervised Multi-view Stereo

学校读者我要写书评

暂无评论

Digging into Uncertainty in Self-supervised Multi-view Stere...

International Conference on computer vision (ICCV)

作者： Hongbin Xu Zhipeng Zhou Yali Wang Wenxiong Kang Baigui Sun Hao Li Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences South China University of Technology Alibaba Group Pazhou Laboratory Shanghai AI Laboratory

ISBN: (纸本)9781665428132

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (U-MVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework 1 achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.

关键词： Optical losses Optical filters computer vision Uncertainty Three-dimensional displays Monte Carlo methods Benchmark testing

Digging into uncertainty in self-supervised multi-view stereo

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Xu, Hongbin Zhou, Zhipeng Wang, Yali Kang, Wenxiong Sun, Baigui Li, Hao Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences South China University of Technology Shanghai AI Laboratory Alibaba Group Pazhou Laboratory

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents. © 2021, CC BY.

关键词： Image reconstruction

Neighbourhood-guided feature reconstruction for occluded person re-identification

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Yu, Shijie Chen, Dapeng Zhao, Rui Chen, Haobin Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences China SenseTime Group Limited Shanghai AI Lab Shanghai China

Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance. To tackle this challenge, we propose to reconstruct the feature representation of occluded parts by fully exploiting the information of its neighborhood in a gallery image set. Specifically, we first introduce a visible part-based feature by body mask for each person image. Then we identify its neighboring samples using the visible features and reconstruct the representation of the full body by an outlierremovable graph neural network with all the neighboring samples as input. Extensive experiments show that the proposed approach obtains significant improvements. In the large-scale Occluded- DukeMTMC benchmark, our approach achieves 64.2% mAP and 67.6% rank-1 accuracy which outperforms the state-of-the-art approaches by large margins, i.e.,20.4% and 12.5%, respectively, indicating the effectiveness of our method on occluded Re-ID problem. Copyright © 2021, The Authors. All rights reserved.

关键词： Security systems

A Spatial Density and Phase Angle Based Correlation for Multi-type Family Photo Identification 5th

学校读者我要写书评

暂无评论

A Spatial Density and Phase Angle Based Correlation for Mult...

5th Asian Conference on pattern recognition, ACPR 2019

作者： Grouver, Anaica Shivakumara, Palaiahnakote Kaljahi, Maryam Asadzadeh Chetty, Bhaarat Pal, Umapada Lu, Tong Hemantha Kumar, G. Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Google Developers Group NASDAQ Bangalore India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China University of Mysore MysoreKarnataka India

ISBN: (纸本)9783030412982

Due to change in mindset and living style of humans, the numbers of diversified marriages are increasing all around the world irrespective of race, color, religion and culture. As a result, it is challenging for research community to identify multi type family photos, namely, normal family (family of the same race, religion or culture), multi-culture family (family of different culture, religion or race) from the family and non-family photos (images with friends, colleagues, etc.). In this work, we present a new method that combines spatial density information with phase angle for multi-type family photo classification. The proposed method uses three facial key points, namely, left-eye, right-eye and nose, for the features which are based on color, roughness and wrinkleless of faces, these are prominent for extracting unique cues for classification. The correlations between features of Left & Right Eyes, Left Eye & Nose and Right Eye & Nose are computed for all the faces in an image. This results in feature vectors for respective spatial density and phase angle information. Furthermore, the proposed method fuses the feature vectors and feeds them to the Convolutional Neural Network (CNN) for the classification of the above-three class problem. Experiments conducted on our database which contains three classes, namely, multi-cultural, normal and non-family images and the benchmark databases (due to Maryam et al. and Wang et al.) which contain two class-family and non-family images, show that the proposed method outperforms the existing methods in terms of classification rate for all the three databases. © 2020, Springer Nature Switzerland AG.

关键词： Database systems

NTIRE 2023 Image Shadow Removal Challenge Report

学校读者我要写书评

暂无评论

NTIRE 2023 Image Shadow Removal Challenge Report

2023 IEEE/CVF Conference on computer vision and pattern recognition Workshops, CVPRW 2023

作者： Vasluianu, Florin-Alexandru Seizinger, Tim Timofte, Radu Cui, Shuhao Huang, Junshi Tian, Shuman Fan, Mingyuan Zhang, Jiaqi Zhu, Li Wei, Xiaoming Wei, Xiaolin Luo, Ziwei Gustafsson, Fredrik K. Zhao, Zheng Sjölund, Jens Schön, Thomas B. Dong, Xiaoyi Zhang, Xi Sheryl Li, Chenghua Leng, Cong Yeo, Woon-Ha Oh, Wang-Taek Lee, Yeo-Reum Ryu, Han-Cheol Luo, Jinting Jiang, Chengzhi Han, Mingyan Wu, Qi Lin, Wenjie Yu, Lei Li, Xinpeng Jiang, Ting Fan, Haoqiang Liu, Shuaicheng Xu, Shuning Song, Binbin Chen, Xiangyu Zhang, Shile Zhou, Jiantao Zhang, Zhao Zhao, Suiyi Zheng, Huan Gao, Yangcheng Wei, Yanyan Wang, Bo Ren, Jiahuan Luo, Yan Kondo, Yuki Miyata, Riku Yasue, Fuma Naruki, Taito Ukita, Norimichi Chang, Hua-En Yang, Hao-Hsiang Chen, Yi-Chung Chiang, Yuan-Chun Huang, Zhi-Kai Chen, Wei-Ting Chen, I-Hsiang Hsieh, Chia-Hsuan Kuo, Sy-Yen Xianwei, Li Fu, Huiyuan Liu, Chunlin Ma, Huadong Fu, Binglan He, Huiming Wang, Mengjia She, Wenxuan Liu, Yu Nathan, Sabari Kansal, Priya Zhang, Zhongjian Yang, Huabin Wang, Yan Zhang, Yanru Phutke, Shruti S. Kulkarni, Ashutosh Khan, Md Raqib Murala, Subrahmanyam Vipparthi, Santosh Kumar Ye, Heng Liu, Zixi Yang, Xingyi Liu, Songhua Wu, Yinwei Jing, Yongcheng Yu, Qianhao Zheng, Naishan Huang, Jie Long, Yuhang Yao, Mingde Zhao, Feng Zhao, Bowen Ye, Nan Shen, Ning Cao, Yanpeng Xiong, Tong Xia, Weiran Li, Dingwen Xia, Shuchen Computer Vision Lab Ifi Caidas University of Würzburg Germany Computer Vision Lab Eth Zürich Switzerland Meituan Group China Department of Information Technology Uppsala University Sweden Institute of Automation Chinese Academy of Sciences Beijing China Nanjing China Maicro Nanjing China Department of Artificial Intelligence Convergence Sahmyook University Seoul Korea Republic of Megvii Technology China University of Electronic Science and Technology of China China University of Macau China China Toyota Technological Institute Japan Graduate Institute of Electronics Engineering National Taiwan University Taiwan Department of Electrical Engineering National Taiwan University Taiwan Graduate Institute of Communication Engineering National Taiwan University Taiwan ServiceNow United States Beijing University of Post and Teleconmunication Beijing China Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education China Couger Inc. Computer Vision and Pattern Recognition Lab Indian Institute of Technology Ropar Punjab Rupnagar India Research Institute Singapore National University of Singapore Singapore Research Institute Singapore University of Sydney Australia Brain-Inspired Vision Laboratory Information Science and Technology Institution University of Science and Technology of China China State Key Laboratory of Fluid Power and Mechatronic Systems School of Mechanical Engineering Zhejiang University Hangzhou310027 China Key Laboratory of Advanced Manufacturing Technology of Zhejiang Province School of Mechanical Engineering Zhejiang University Hangzhou310027 China South China University of Technology China

ISBN: (纸本)9798350302493

This work reviews the results of the NTIRE 2023 Challenge on Image Shadow Removal. The described set of solutions were proposed for a novel dataset, which captures a wide range of object-light interactions. It consists of 1200 roughly pixel aligned pairs of real shadow free and shadow affected images, captured in a controlled environment. The data was captured in a white-box setup, using professional equipment for lights and data acquisition sensors. The challenge had a number of 144 participants registered, out of which 19 teams were compared in the final ranking. The proposed solutions extend the work on shadow removal, improving over the performance level describing state-of-the-art methods. © 2023 IEEE.

关键词： Data acquisition

LEDNet: Deep Network for Single Image Haze Removal 2018

学校读者我要写书评

暂无评论

LEDNet: Deep Network for Single Image Haze Removal

Proceedings of the 11th Indian Conference on computer vision, Graphics and Image Processing

作者： Akshay Dudhane Subrahmanyam Murala Abhinav Dhall Computer Vision and Pattern Recognition Lab Indian Institute of Technology Ropar India Learning Affect and Semantic Image Analysis Group Indian Institute of Technology Ropar India

ISBN: (纸本)9781450366151

Haze during the bad weather, degrades the visibility of the scene drastically. Degradation of scene visibility varies with respect to the transmission coefficient/map (Tc) of the scene. Estimation of accurate Tc is key step to reconstruct the haze free scene. Previously, local as well as global priors were proposed to estimate the Tc. We, on the other hand, propose integration of local and global approaches to learn both point level and object level Tc. The proposed local encoder decoder network (LEDNet) estimates the scene transmission map in two stages. During first stage, network estimates the point level Tc using parallel convolutional filters and spatial invariance filtering. The second stage comprises of a two level encoder-decoder architecture which anticipates the object level Tc. We also propose, local air-light estimation (LAE) algorithm, which is able to obtain the air-light component of the outdoor scene. Combination of LEDNet and LAE improves the accuracy of haze model to recover the scene radiance. Structural similarity index, mean square error and peak signal to noise ratio are used to evaluate the performance of the proposed approach for single image haze removal. Experiments on benchmark datasets show that LEDNet outperforms the existing state-of-the-art methods for single image haze removal.

关键词： Haze removal transmission map CNN