检索结果-内蒙古大学图书馆

arXiv 2023年

作者： Huang, Yi Huang, Jiancheng Liu, Jianzhuang Yan, Mingfu Dong, Yu Lyu, Jiaxi Chen, Chaoqi Chen, Shifeng Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen518055 China University of Chinese Academy of Sciences Beijing100039 China The University of Hong Kong Hong Kong

Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100× faster than existing image restoration methods using vanilla diffusion models. The code is available at https://***/stayalive16/WaveDM. Copyright © 2023, The Authors. All rights reserved.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Matching individual Ladoga ringed seals across short-term image sequences (vol 102, pg 957, 2022)

引用

MAMMALIAN BIOLOGY 2022年第3期102卷 1045-1045页

作者： Nepovinnykh, E. Computer Vision and Pattern Recognition Laboratory Department of Computational Engineering School of Engineering Science Lappeenranta-Lahti University of Technology LUT P.O.Box 20 53851 Lappeenranta Finland Department of Artificial Intelligence Institute of Computer Science and Technology Peter the Great St. Petersburg Polytechnic University Polytechnicheskaya 29 Saint Petersburg Russian Federation 195251 Department of Computer Science and Computational Experiment Southern Federal University Rostov-on-Don Russian Federation 344006 Interregional Charitable Public Organization “Biologists for Nature Conservation” (BFNC) 24 line 3-7 Saint Petersburg Russian Federation 199106

Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are taken towards the automatic photo-identification of the Ladoga ringed seals (Pusa hispida ladogensis). A method is proposed that takes a sequence of images, each containing multiple individuals as the input, and produces cropped images of seals grouped based on one certain individual per group. The method starts by detecting each seal from the images and proceeds to matching the individual seals between the images. It is shown that high grouping accuracy can be obtained with a general-purpose image retrieval method on an image sequence taken from the same location within a relatively short period of time. Each resulting group contains multiple images of one individual with slightly different variations, for example, in pose and illumination. Utilizing these images simultaneously provides more information for the individual re-identification compared to the traditional approach, i.e., which utilizes just one image at a time. It is further demonstrated that a convolutional neural network based method can be used to extract the unique pelage patterns of the seals despite the low contrast. Finally, a method is proposed and experiments with the novel Ladoga ringed seals data are carried out to provide a proof-of-concept for the individual re-identification.

关键词： Animal re-identification Convolutional neural networks Instance segmentation Ladoga ringed seal Photo-identification

来源：评论

学校读者我要写书评

暂无评论

A New Lightweight Attention-based Model for Emotion recognition on Distorted Social Media Face Images

TechRxiv

引用

TechRxiv 2023年

作者： Roy, Ayush Shivakumara, Palaiahnakote Pal, Umapada Gornale, Shivanand S. Liu, Cheng-Lin Jadavpur University India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Department of Computer Science Rani Channamma University Belagavi India Institute of Automation Chinese Academy of Sciences China

The recognition of human emotions remains a challenging task for social media images. This is due to distortions created by different social media conflict with the minute changes in facial expression. This study presents a new model called the Global Spectral-Spatial Attention Network (GSSAN), which leverages both local and global information simultaneously. The proposed model comprises a shallow Convolutional Neural Network (CNN) with an MBResNext block, which integrates the features extracted from MobileNet, ResNet, and DenseNet for extracting local features. In addition, to strengthen the discriminating power of the features, GSSAN incorporates Fourier features, which provide essential cues for minute changes in the face images. To test the proposed model for emotion recognition using social media images, we conduct experiments on two widely-used datasets: FER-2013 and AffectNet. The same benchmark datasets are uploaded and downloaded to create a distorted social media image dataset to test the proposed model. Experiments on distorted social media images dataset show that the model surpasses the accuracy of SOTA models by 0.69% for FER-2013 and 0.51% for AffectNet social mediad datasets. The same inference can be drawn from the experiments on standard datasets. © 2023, CC BY.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

TIC: Text-Guided Image Colorization

arXiv

引用

arXiv 2022年

作者： Ghosh, Subhankar Roy, Prasun Bhattacharya, Saumik Pal, Umapada Blumenstein, Michael Faculty of Engineering and IT University of Technology SydneyNSW Australia The Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Indian Institute of Technology Kharagpur India

Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To the best of our knowledge, this is one of the first attempts to incorporate textual conditioning in the colorization pipeline. To do so, we have proposed a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. We have evaluated our proposed model using different metrics and found that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively. © 2022, CC BY.

关键词： Pipelines

来源：评论

学校读者我要写书评

暂无评论

ClueNet: A deep framework for occluded pedestrian pose estimation 30

ClueNet: A deep framework for occluded pedestrian pose estim...

引用

30th British Machine vision Conference, BMVC 2019

作者： Kishore, Perla Sai Raj Das, Sudip Mukherjee, Partha Sarathi Bhattacharya, Ujjwal Institute of Engineering and Management Kolkata India Indian Statistical Institute Computer Vision and Pattern Recognition Unit Kolkata India

Pose estimation of a pedestrian helps to gather information about the current activity or the instant behaviour of the subject. Such information is useful for autonomous vehicles, augmented reality, video surveillance, etc. Although a large volume of pedestrian detection studies are available in the literature, detection of the same in situations of significant occlusions still remains a challenging task. In this work, we take a step further to propose a novel deep learning framework, called ClueNet, to detect as well as estimate the entire pose of occluded pedestrians in an unsupervised manner. ClueNet is a two stage framework where the first stage generates visual clues for the second stage to accurately estimate the pose of occluded pedestrians. The first stage employs a multi-task network to segment the visible parts and predict a bounding box enclosing the visible and occluded regions for each pedestrian. The second stage uses these predictions from the first stage for pose estimation. Here we propose a novel strategy, called Mask and Predict, to train our ClueNet to estimate the pose even for occluded regions. Additionally, we make use of various other training strategies to further improve our results. The proposed work is first of its kind and the experimental results on CityPersons and MS COCO datasets show the superior performance of our approach over existing methods. © 2019. The copyright of this document resides with its authors.

关键词： Augmented reality

来源：评论

学校读者我要写书评

暂无评论

MIO: Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

arXiv

引用

arXiv 2021年

作者： Manna, Siladittya Pal, Umapada Bhattacharya, Saumik Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Department of Electronics and Electrical Communication Engineering Indian Institute of Technology Kharagpur India

Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel mutual information optimization-based loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect and predict whether a pair is positive or negative. We further improve the näive loss function using the Majorize-Minimizer principle and such improvement helps us to track the problem mathematically. Unlike the existing methods, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a closed-form expression for the parameter gradient flow and compare the behavior of the proposed loss function using its Hessian eigen-spectrum to analytically study the convergence of SSL frameworks. The proposed method outperforms the SOTA contrastive self-supervised frameworks on benchmark datasets like CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet. After 200 epochs of pre-training with ResNet-18 as the backbone, the proposed model achieves an accuracy of 86.2%, 58.18%, 77.49%, and 30.87% on CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet datasets, respectively, and surpasses the SOTA contrastive baseline by 1.23%, 3.57%, 2.00%, and 0.33%, respectively. Copyright © 2021, The Authors. All rights reserved.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

Activating More Pixels in Image Super-Resolution Transformer

Activating More Pixels in Image Super-Resolution Transformer

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Xiangyu Chen Xintao Wang Jiantao Zhou Yu Qiao Chao Dong State Key Laboratory of Internet of Things for Smart City University of Macau Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory ARC Lab Tencent PCG

Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.

关键词：

来源：评论

学校读者我要写书评

暂无评论

TREC 2020 NEWS Track Background Linking Task 29

TREC 2020 NEWS Track Background Linking Task

引用

29th Text REtrieval Conference, TREC 2020

作者： Gautam, Rahul Mitra, Mandar Roy, Dwaipayan Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B.T.Road Kolkata700108 India

The Background Linking task is a problem that focuses on providing users with suggestions for articles to read next, when the user is reading a news article. The suggested articles should provide adequate context and background information for the article that the user is currently reading. In this paper, we describe several methods that we explored for this task, and report their results. © 2020 29th Text REtrieval Conference, TREC 2020 - Proceedings. All Rights Reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Reconnoitering the class distinguishing abilities of the features, to know them better

arXiv

引用

arXiv 2022年

作者： Sadhukhan, Payel Palit, Sarbani Sengupta, Kausik Institute for Advancing Intelligence Tcg Crest West Bengal Kolkata700091 India Computer Vision and Pattern Recognition Unit Indian Statistical Institute West Bengal Kolkata700108 India

The relevance of machine learning (ML) in our daily lives is closely intertwined with its explainability. Explainability can allow end-users to have a transparent and humane reckoning of a ML scheme's capability and utility. It will also foster the user's confidence in the automated decisions of a system. Explaining the variables or features to explain a model's decision is a need of the present times. We could not really find any work, which explains the features on the basis of their class-distinguishing abilities (specially when the real world data are mostly of multi-class nature). In any given dataset, a feature is not equally good at making distinctions between the different possible categorizations (or classes) of the data points. In this work, we explain the features on the basis of their class or category-distinguishing capabilities. We particularly estimate the class-distinguishing capabilities (scores) of the variables for pair-wise class combinations. We validate the explainability given by our scheme empirically on several realworld, multi-class datasets. We further utilize the class-distinguishing scores in a latent feature context and propose a novel decision making protocol. Another novelty of this work lies with a refuse to render decision option when the latent variable (of the test point) has a high class-distinguishing potential for the likely classes. © 2022, CC BY.

关键词： Decision making

来源：评论

学校读者我要写书评

暂无评论

Anomaly Handwritten Text Detection for Automatic Descriptive Answer Evaluation 22

Anomaly Handwritten Text Detection for Automatic Descriptive...

引用

Proceedings of the 2022 11th International Conference on Computing and pattern recognition

作者： Nilanjana Chatterjee Palaiahnaakote Shivakumara Umapada Pal Tong Lu Yue Lu Computer Vision and Pattern Recognition Unit Indian Statistical Institute India Faculty of Computer Science and Information Technology University of Malaya Malaysia National Key Lab for Novel Software Technology Nanjing University China Shanghai Key Laboratory of Multidimensional Information Processing East China Normal University China

ISBN: (纸本)9781450397056

Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and answers to the question. This paper presents a novel method for detecting anomaly handwritten text in the responses written by the students to the questions. The method is proposed based on the fact that when the students are confident in answering questions, the students usually write answers legibly and neatly while they are not confident, they write sloppy writing which may not be easy for the reader to understand. To detect such anomaly handwritten text, we explore a new combination of Fourier transform and deep learning model for detecting edges. This result preserves the structure of handwritten text. For extracting features for classification of anomaly text and normal text, the proposed method studies the behavior of writing style, especially the variation at ascenders and descenders. Therefore, the proposed work draws principal axis which is invariant to rotation, scaling and some extent to distortion for the edge images. With respect to principal axis, the proposed method draws medial axis using uppermost and lowermost points. The distance between the medial axis and principal axis points are considered as feature vector. Further, the feature vector is passed to Artificial Neural Network for classification of anomaly text. The proposed method is evaluated by testing on our own dataset, standard dataset of gender identification (IAM) and handwritten forgery detection dataset (ACPR 2019). The results on different datasets show that the proposed work outperforms the existing methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：