检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Liu, Xin Wang, Xingzhi Cheung, Yiu-Ming Department of Computer Science Huaqiao University Xiamen Key Laboratory of Computer Vision and Pattern Recognition Fujian Key Laboratory of Big Data Intelligence and Security Xiamen361021 China School of Electronics and Information Technology Sun Yat-sen University Guangzhou510006 China Department of Computer Science Hong Kong Baptist University Hong Kong Hong Kong

Cross-modal hashing, favored for its effectiveness and efficiency, has received wide attention to facilitating efficient retrieval across different modalities. Nevertheless, most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes, while often involving time-consuming training procedure for handling the large-scale dataset. To tackle these issues, we formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data so as to minimize the quantization loss of mapping such data to hamming space, and propose an efficient Fast Discriminative Discrete Hashing (FDDH) approach for large-scale cross-modal retrieval. More specifically, FDDH introduces an orthogonal basis to regress the targeted hash codes of training examples to their corresponding semantic labels, and utilizes ϵ-dragging technique to provide provable large semantic margins. Accordingly, the discriminative power of semantic information can be explicitly captured and maximized. Moreover, an orthogonal transformation scheme is further proposed to map the nonlinear embedding data into the semantic subspace, which can well guarantee the semantic consistency between the data feature and its semantic representation. Consequently, an efficient closedform solution is derived for discriminative hash code learning, which is very computationally efficient. In addition, an effective and stable online learning strategy is presented for optimizing modality-specific projection functions, featuring adaptivity to different training sizes and streaming data. The proposed FDDH approach theoretically approximates the bi-Lipschitz continuity, runs sufficiently fast, and also significantly improves the retrieval performance over the state-of-the-art methods. The source code is released at: https://***/starxliu/FDDH. © 2021, CC0.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

EDEN: Deep feature distribution pooling for Saimaa ringed seals pattern matching

arXiv

引用

arXiv 2021年

作者： Chelak, Ilia Nepovinnykh, Ekaterina Eerola, Tuomas Kälviäinen, Heikki Belykh, Igor Peter the Great St. Petersburg Polytechnic University Saint Petersburg Russia Lappeenranta-Lahti University of Technology LUT School of Engineering Science Department of Computational Engineering Computer Vision and Pattern Recognition Laboratory P.O.Box 20 Lappeenranta53850 Finland

In this paper, pelage pattern matching is considered to solve the individual re-identification of the Saimaa ringed seals. Animal re-identification together with the access to large amount of image material through camera traps and crowd-sourcing provide novel possibilities for animal monitoring and conservation. We propose a novel feature pooling approach that allow aggregating the local pattern features to get a fixed size embedding vector that incorporate global features by taking into account the spatial distribution of features. This is obtained by eigen decomposition of covariances computed for probability mass functions representing feature maps. Embedding vectors can then be used to find the best match in the database of known individuals allowing animal re-identification. The results show that the proposed pooling method outperforms the existing methods on the challenging Saimaa ringed seal image data. © 2021, CC BY-NC-ND.

关键词： pattern matching

来源：评论

学校读者我要写书评

暂无评论

A Survey of the Self Supervised Learning Mechanisms for vision Transformers

arXiv

引用

arXiv 2024年

作者： Khan, Asifullah Sohail, Anabia Fiaz, Mustansar Hassan, Mehdi Afridi, Tariq Habib Marwat, Sibghat Ullah Munir, Farzeen Ali, Safdar Naseem, Hannan Zaheer, Muhammad Zaigham Ali, Kamran Sultana, Tangina Tanoli, Ziaurrehman Akhter, Naeem Pattern Recognition Lab DCIS PIEAS Nilore Islamabad45650 Pakistan PIEAS Nilore Islamabad45650 Pakistan Deep Learning Lab Center for Mathematical Sciences PIEAS Nilore Islamabad45650 Pakistan Center of Secure Cyber-Physical Security Systems Khalifa University Abu Dhabi United Arab Emirates IBM Research United States Department of Computer Science Air University Islamabad Pakistan Department of Computer Science and Engineering Kyung Hee University Global Campus 1732 Gyeonggi-do Yongin17104 Korea Republic of Department of Electrical Engineering and Automation Aalto University Finland Finnish Center of Artificial Center Finland Faculty of Engineering and Green Technology Universiti Tunku Abdul Rahman Malaysia Computer Vision Department Mohamed Bin Zayed University of Artificial Intelligence United Arab Emirates Karachi Pakistan Department of Electronics and Communication Engineering Hajee Mohammad Danesh Science and Technology University Bangladesh HiLIFE University of Helsinki Finland

vision Transformers (ViTs) have recently demonstrated remarkable performance in computer vision tasks. However, their parameter-intensive nature and reliance on large amounts of data for effective performance have shifted the focus from traditional human-annotated labels to unsupervised learning and pretraining strategies that uncover hidden structures within the data. In response to this challenge, self-supervised learning (SSL) has emerged as a promising paradigm. SSL leverages inherent relationships within the data itself as a form of supervision, eliminating the need for manual labeling and offering a more scalable and resource-efficient alternative for model training. Given these advantages, it is imperative to explore the integration of SSL techniques with ViTs, particularly in scenarios with limited labeled data. Inspired by this evolving trend, this survey aims to systematically review SSL mechanisms tailored for ViTs. We propose a comprehensive taxonomy to classify SSL techniques based on their representations and pre-training tasks. Additionally, we discuss the motivations behind SSL, review prominent pre-training tasks, and highlight advancements and challenges in this field. Furthermore, we conduct a comparative analysis of various SSL methods designed for ViTs, evaluating their strengths, limitations, and applicability to different scenarios. Copyright © 2024, The Authors. All rights reserved.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

68 landmarks are efficient for 3D face alignment: What about more? 3D face alignment method applied to face recognition

TechRxiv

引用

TechRxiv 2021年

作者： Jabberi, Marwa Wali, Ali Chaudhuri, Bidyut Baran Alimi, Adel M. University of Sousse ISITCom Sousse4011 Tunisia BP 1173 Sfax3038 Tunisia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata700108 India Department of Electrical and Electronic Engineering Science Faculty of Engineering and the Built Environment University of Johannesburg South Africa

This paper proposes a 3D face alignment of 2D face images in the wild with noisy landmarks where the objective is to recognize individuals from their single profile image. We first proceed by extracting more than 68 landmarks using a bag of features. This allows us to obtain a bag of visible and invisible facial keypoints. Then, we reconstruct a 3D face model and get a triangular mesh by meshing the obtained keypoints. For each face, the number of keypoints is not the same which makes this step very challenging. Later, we process the 3D face using butterfly and BPA algorithms to make correlation and regularity between 3D face regions. Indeed, 2D-to-3D annotations give much higher quality to the 3D reconstructed face model without the need for any additional 3D Morphable models. Finally, alignment and pose correction steps are carried out to get frontal pose by fitting the rendered 3D reconstructed face to 2D face and performing pose normalization to achieve good rates in face recognition. The recognition step is based on deep learning and it is performed using DCNNs, which are very powerful and modern, for feature learning and face identification. To verify the proposed method, two popular benchmarks, YTF and LFW databases, are tested. Compared to the best recognition results reported on these two benchmarks, our proposed method achieves comparable or even better recognition performances. © 2021, CC BY.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Towards accurate scene text recognition with semantic reasoning networks

arXiv

引用

arXiv 2020年

作者： Yu, Deli Li, Xuan Zhang, Chengquan Liu, Tao Han, Junyu Liu, Jingtuo Ding, Errui School of Artificial Intelligence University of Chinese Academy of Sciences Department of Computer Vision Technology Baidu Inc National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences

Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use. Copyright © 2020, The Authors. All rights reserved.

关键词： Semantic Web

来源：评论

学校读者我要写书评

暂无评论

AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

arXiv

引用

arXiv 2021年

作者： Roy, Swalpa Kumar Paoletti, Mercedes E. Haut, Juan M. Dubey, Shiv Ram Kar, Purbayan Plaza, Antonio Chaudhuri, Bidyut B. The Computer Science and Engineering Alipurduar Government Engineering and Management College 736206 India The Hyperspectral Computing Laboratory Department of Technology of Computers and Communications University of Extremadura Cáceres10003 Spain The Computer Vision and Biometrics Lab Indian Institute of Information Technology Prayagraj Uttar Pradesh Allahabad211015 India The Media Analysis Group Sony Research India Private Limited Karnataka Bangalore560103 India The Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata700108 India

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. Source code: https://***/mhaut/AngularGrad. Copyright © 2021, The Authors. All rights reserved.

关键词： Optimization

来源：评论

学校读者我要写书评

暂无评论

Towards Accurate Scene Text recognition With Semantic Reasoning Networks

Towards Accurate Scene Text Recognition With Semantic Reason...

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Deli Yu Xuan Li Chengquan Zhang Tao Liu Junyu Han Jingtuo Liu Errui Ding School of Artificial Intelligence University of Chinese Academy of Sciences National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Department of Computer Vision Technology(VIS) Baidu Inc.

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

关键词： Semantics Visualization Text recognition Cognition Feature extraction Decoding Robustness

来源：评论

学校读者我要写书评

暂无评论

Split-Net: Dual Transformer Encoder with Splitting Scene Text Image for Script Identification

引用

pattern recognition Letters 2025年

作者： Ayush Roy Shivakumara Palaiahnakote Umapada Pal Cheng-Lin Liu PhD Department of Computer Science and Engineering State University of New York Buffalo USA School of Science Engineering and Environment University of Salford Manchester UK Computer Vision and Pattern Recognition Indian Statistical Institute Kolkata India State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation of the Chinese Academy of Sciences Beijing China School of Artificial Intelligence University of Chinese Academy of Sciences Beijing China

Script identification is vital for understanding scenes and video images. It is challenging due to high variations in physical appearance, typeface design, complex background, distortion, and significant overlap in the characteristics of different scripts. Unlike existing models, which aim to tackle the script images utilizing the scene text image as a whole, we propose to split the image into upper and lower halves to capture the intricate differences in stroke and style of various scripts. Motivated by the accomplishments of the transformer, a modified script-style-aware Mobile-vision Transformer (M-ViT) is explored for encoding visual features of the images. To enrich the features of the transformer blocks, a novel Edge Enhanced Style Aware Channel Attention Module (EESA-CAM) has been integrated with M-ViT. Furthermore, the model fuses the features of the dual encoders (extracting features from the upper and the lower half of the images) by a dynamic weighted average procedure utilizing the gradient information of the encoders as the weights. In experiments on three standard datasets, MLe2e, CVSI2015, and SIW-13, the proposed model yielded superior performance compared to state-of-the-art models.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Face recognition - A one-shot learning perspective 15

Face recognition - A one-shot learning perspective

引用

15th International Conference on Signal Image Technology and Internet Based Systems, SISITS 2019

作者： Chanda, Sukalpa Gv, Asish Chakrapani Brun, Anders Hast, Anders Pal, Umapada Doermann, David Department of Information Technology Østfold University College Norway Computer Vision and Pattern Recognition Unit Indian Statistical Institute India Centre for Image Analysis Uppsala University Sweden Computer Science and Engineering University at Buffalo United States

ISBN: (纸本)9781728156866

Ability to learn from a single instance is something unique to the human species and One-shot learning algorithms try to mimic this special capability. On the other hand, despite the fantastic performance of Deep Learning-based methods on various image classification problems, performance often depends having on a huge number of annotated training samples per class. This fact is certainly a hindrance in deploying deep neural network-based systems in many real-life applications like face recognition. Furthermore, an addition of a new class to the system will require the need to re-train the whole system from scratch. Nevertheless, the prowess of deep learned features could also not be ignored. This research aims to combine the best of deep learned features with a traditional One-Shot learning framework. Results obtained on 2 publicly available datasets are very encouraging achieving over 90% accuracy on 5-way One-Shot tasks, and 84% on 50-way One-Shot problems. © 2019 IEEE.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation

arXiv

引用

arXiv 2021年

作者： Srivastava, Abhishek Jha, Debesh Chanda, Sukalpa Pal, Umapada Johansen, Håvard D. Johansen, Dag Riegler, Michael A. Ali, Sharib Halvorsen, Pål Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India SimulaMet Oslo Norway UiT The Arctic University of Norway Tromsø Norway Østfold University College Halden Norway Indian Statistical Institute Kolkata India The Department of Engineering Science University of Oxford Oxford NIHR Biomedical Research Centre Oxford United Kingdom Oslo Metropolitan University Oslo Norway

Methods based on convolutional neural networks have improved the performance of biomedical image segmentation However, most of these methods cannot efficiently segment objects of variable sizes and train on small and biased datasets, which are common for biomedical use cases While methods exist that incorporate multi-scale fusion approaches to address the challenges arising with variable sizes, they usually use complex models that are more suitable for general semantic segmentation problems In this paper, we propose a novel architecture called Multi-Scale Residual Fusion Network (MSRF-Net), which is specially designed for medical image segmentation The proposed MSRF-Net is able to exchange multi-scale features of varying receptive fields using a Dual-Scale Dense Fusion (DSDF) block Our DSDF block can exchange information rigorously across two different resolution scales, and our MSRF sub-network uses multiple DSDF blocks in sequence to perform multi-scale fusion This allows the preservation of resolution, improved information flow and propagation of both high-and low-level features to obtain accurate segmentation maps The proposed MSRF-Net allows to capture object variabilities and provides improved results on different biomedical datasets Extensive experiments on MSRF-Net demonstrate that the proposed method outperforms the cutting-edge medical image segmentation methods on four publicly available datasets We achieve the Dice Coefficient (DSC) of 0.9217, 0.9420, and 0.9224, 0.8824 on Kvasir-SEG, CVC-ClinicDB, 2018 Data Science Bowl dataset, and ISIC-2018 skin lesion segmentation challenge dataset respectively We further conducted generalizability tests and achieved DSC of 0.7921 and 0.7575 on CVC-ClinicDB and Kvasir-SEG, respectively. © 2021, CC BY.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：