检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Roy, Swalpa Kumar Paoletti, Mercedes E. Haut, Juan M. Dubey, Shiv Ram Kar, Purbayan Plaza, Antonio Chaudhuri, Bidyut B. The Computer Science and Engineering Alipurduar Government Engineering and Management College 736206 India The Hyperspectral Computing Laboratory Department of Technology of Computers and Communications University of Extremadura Cáceres10003 Spain The Computer Vision and Biometrics Lab Indian Institute of Information Technology Prayagraj Uttar Pradesh Allahabad211015 India The Media Analysis Group Sony Research India Private Limited Karnataka Bangalore560103 India The Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata700108 India

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. Source code: https://***/mhaut/AngularGrad. Copyright © 2021, The Authors. All rights reserved.

关键词： Optimization

来源：评论

学校读者我要写书评

暂无评论

Machine Learning and computer vision Techniques in Continuous Beehive Monitoring Applications: A Survey

arXiv

引用

arXiv 2022年

作者： Bilik, Simon Zemcik, Tomas Kratochvila, Lukas Ricanek, Dominik Richter, Miloslav Zambanini, Sebastian Horak, Karel Department of Control and Instrumentation Faculty of Electrical Engineering and Communication Brno University of Technology Technická 3058/10 Brno61600 Czech Republic Computer Vision and Pattern Recognition Laboratory Department of Computational Engineering Lappeenranta-Lahti University of Technology LUT Yliopistonkatu 34 Lappeenranta53850 Finland Computer Vision Lab Institute of Visual Computing & Human-Centered Technology Faculty of Informatics TU Wien Favoritenstr. 9/193-1 ViennaA-1040 Austria

Wide use and availability of machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new applications appears also in biology and agriculture, where they may be used to detect infections, parasites and weeds, but also for automated monitoring and early warning systems. This goes in concordance with the introduction of the easily accessible hardware and development kits such as the Arduino, or RaspberryPi families. In this paper, we survey 50 papers focusing on the methods of automated beehive monitoring using computer vision techniques. Particularly on the pollen and Varroa mite detection together with the bee traffic monitoring. Such systems could also be used for monitoring of honeybee colonies and for the inspection of their health state, which could potentially identify dangerous states before the situation is critical, or to better plan periodic bee colony inspections and therefore save significant costs. Further on, we also include analysis of the research trends in this application field and we outline the possible directions of new development. Our paper is also aimed at veterinary and apidology professionals and experts, who may not be familiar with machine learning to introduce them to its capabilities, hence each family of techniques is prefaced by a brief theoretical introduction and motivation related to its base method. We hope that this paper will inspire other scientists to use machine learning techniques for other applications in beehive monitoring. © 2022, CC BY.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Self-slimmed vision Transformer

arXiv

引用

arXiv 2021年

作者： Zong, Zhuofan Li, Kunchang Song, Guanglu Wang, Yali Qiao, Yu Leng, Biao Liu, Yu School of Computer Science and Engineering Beihang University China SenseTime Research China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China Shanghai AI Laboratory China

vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. However, such powerful transformers bring a huge computation burden, because of the exhausting token-to-token comparison. The previous works focus on dropping insignificant tokens to reduce the computational cost of ViTs. But when the dropping ratio increases, this hard manner will inevitably discard the vital tokens, which limits its efficiency. To solve the issue, we propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation. As a general method of token hard dropping, our TSM softly integrates redundant tokens into fewer informative ones. It can dynamically zoom visual attention without cutting off discriminative token relations in the images, even with a high slimming ratio. Furthermore, we introduce a concise Feature Recalibration Distillation (FRD) framework, wherein we design a reverse version of TSM (RTSM) to recalibrate the unstructured token in a flexible auto-encoder manner. Due to the similar structure between teacher and student, our FRD can effectively leverage structure knowledge for better convergence. Finally, we conduct extensive experiments to evaluate our SiT. It demonstrates that our method can speed up ViTs by 1.7× with negligible accuracy drop, and even speed up ViTs by 3.6× while maintaining 97% of their performance. Surprisingly, by simply arming LV-ViT with our SiT, we achieve new state-of-the-art performance on ImageNet. Code is available at https://***/Sense-X/SiT. © 2021, CC BY.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Local Global Transformer for Point Clouds Analysis

SSRN

引用

SSRN 2023年

作者： Li, Dilong Zheng, Shenghong Chen, Ziyi Li, Xiang Wang, Lanying Du, Jixiang College of Computer Science and Technology Fujian Key Laboratory of Big Data Intelligence and Security Xiamen Key Laboratory of Computer Vision and Pattern Recognition Xiamen Key Laboratory of Data Security and Blockchain Technology Huaqiao University FJ Xiamen361021 China School of Economics and Finance Huaqiao University FJ Quanzhou362021 China Department of Geography and Environmental Management University of Waterloo WaterlooONN2L 3G1 Canada

Transformer networks have demonstrated remarkable performance in point cloud analysis. However, achieving a balance between local regional context and global long-range context learning remains a significant challenge. In this paper, we propose a Hierarchical Local Global Transformer Network (LGTNet), designed to capture local and global contexts in a hierarchical manner. Specifically, we employ serial local and global Transformers to learn the inner-group and cross-group self-attention, respectively. Besides, we propose a geometric moment-based position encoding for local Transformer, enabling the embedding of comprehensive local geometric relationship. Additionally, we also introduce a global feature pooling module that extracts the global features from each encoder layers. Extensive experimental results demonstrate that LGTNet achieves state-of-the-art performance on ShapeNetPart and ScanObjectNN datasets. This approach effectively enhances the understanding of point cloud scenes, thereby facilitating the use of point cloud data in remote sensing applications. © 2023, The Authors. All rights reserved.

关键词： Geometry

来源：评论

学校读者我要写书评

暂无评论

Multi-scale Promoted Self-adjusting Correlation Learning for Facial Action Unit Detection

引用

IEEE Transactions on Affective Computing 2024年第2期16卷 697-711页

作者： Liu, Xin Yuan, Kaishen Niu, Xuesong Shi, Jingang Yu, Zitong Yue, Huanjing Yang, Jingyu Tianjin University School of Electrical and Information Engineering Tianjin300072 China Lappeenranta-Lahti University of Technology LUT Computer Vision and Pattern Recognition Laboratory School of Engineering Science Lappeenranta53850 Finland Beijing Institute for General Artificial Intelligence Beijing100080 China Xi'an Jiaotong University School of Software Engineering Xi'an710049 China Great Bay University Dongguan523000 China

Facial Action Unit (AU) detection is a crucial task in affective computing and social robotics as it helps to identify emotions expressed through facial expressions. Anatomically, there are innumerable correlations between AUs, which contain rich information and are vital for AU detection. Previous methods used fixed AU correlations based on expert experience or statistical rules on specific benchmarks, but it is challenging to comprehensively reflect complex correlations between AUs via hand-crafted settings. There are alternative methods that employ a fully connected graph to learn these dependencies exhaustively. However, these approaches can result in a computational explosion and high dependency with a large dataset. To address these challenges, this paper proposes a novel self-adjusting AU-correlation learning (SACL) method with less computation for AU detection. This method adaptively learns and updates AU correlation graphs by efficiently leveraging the characteristics of different levels of AU motion and emotion representation information extracted in different stages of the network. Moreover, this paper explores the role of multi-scale learning in correlation information extraction, and design a simple yet effective multi-scale feature learning (MSFL) method to promote better performance in AU detection. By integrating AU correlation information with multi-scale features, the proposed method obtains a more robust feature representation for the final AU detection. Extensive experiments show that the proposed method outperforms the state-of-the-art methods on widely used AU detection benchmark datasets, with only 28.7% and 12.0% of the parameters and FLOPs of the best method, respectively. The code for this method is available at https://***/linuxsino/Self-adjusting-AU. © 2010-2012 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection

arXiv

引用

arXiv 2022年

作者： Liu, Yi Zhang, Xuan Li, Ying Liang, Guixin Jiang, Yabing Qiu, Lixia Tang, Haiping Xie, Fei Yao, Wei Dai, Yi Qiao, Yu Wang, Yali ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China Shenzhen Bwell Technology Co. Ltd China Shenzhen Longhua Drainage Co. Ltd China Shanghai AI Laboratory Shanghai China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Video understanding is an important problem in computer vision. Currently, the well-studied task in this research is human action recognition, where the clips are manually trimmed from the long videos, and a single class of human action is assumed for each clip. However, we may face more complicated scenarios in the industrial applications. For example, in the real-world urban pipe system, anomaly defects are fine-grained, multi-labeled, domain-relevant. To recognize them correctly, we need to understand the detailed video content. For this reason, we propose to advance research areas of video understanding, with a shift from traditional action recognition to industrial anomaly analysis. In particular, we introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems. Based on these new datasets, we will host two competitions including (1) Video Defect Classification on QV-Pipe and (2) Temporal Defect Localization on CCTV-Pipe. In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary. We expect that, this competition would bring new opportunities and challenges for video understanding in smart city and beyond. The details of our VideoPipe challenge can be found in https://***. Copyright © 2022, The Authors. All rights reserved.

关键词： Defects

来源：评论

学校读者我要写书评

暂无评论

UniFormer: Unifying Convolution and Self-attention for Visual recognition

arXiv

引用

arXiv 2022年

作者： Li, Kunchang Wang, Yali Zhang, Junhao Gao, Peng Song, Guanglu Liu, Yu Li, Hongsheng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen518055 China University of Chinese Academy of Sciences Beijing100049 China Shanghai Artificial Intelligence Laboratory Shanghai200232 China National University of Singapore Singapore Shanghai Artificial Intelligence Laboratory China SenseTime Research China The Chinese University of Hong Kong Hong Kong

It is a challenging task to learn discriminative representation from images and videos, due to large local redundancy and complex global dependency in these visual data. Convolution neural networks (CNNs) and vision transformers (ViTs) have been two dominant frameworks in the past few years. Though CNNs can efficiently decrease local redundancy by convolution within a small neighborhood, the limited receptive field makes it hard to capture global dependency. Alternatively, ViTs can effectively capture long-range dependency via self-attention, while blind similarity comparisons among all the tokens lead to high redundancy. To resolve these problems, we propose a novel Unified transFormer (UniFormer), which can seamlessly integrate the merits of convolution and self-attention in a concise transformer format. Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing tackling both redundancy and dependency for efficient and effective representation learning. Finally, we flexibly stack our blocks into a new powerful backbone, and adopt it for various vision tasks from image to video domain, from classification to dense prediction. Without any extra training data, our UniFormer achieves 86.3 top-1 accuracy on ImageNet-1K classification task. With only ImageNet-1K pre-training, it can simply achieve state-of-the-art performance in a broad range of downstream tasks. It obtains 82.9/84.8 top-1 accuracy on Kinetics-400/600, 60.9/71.2 top-1 accuracy on Something-Something V1/V2 video classification tasks, 53.8 box AP and 46.4 mask AP on COCO object detection task, 50.8 mIoU on ADE20K semantic segmentation task, and 77.4 AP on COCO pose estimation task. Moreover, we build an efficient UniFormer with a concise hourglass design of token shrinking and recovering, which achieves 2-4× higher throughput than the recent lightweight models. Code is av

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

CASIA-SURF: A Large-scale Multi-modal Benchmark for Face Anti-spoofing

arXiv

引用

arXiv 2019年

作者： Zhang, Shifeng Liu, Ajian Wan, Jun Liang, Yanyan Guo, Guogong Escalera, Sergio Escalante, Hugo Jair Li, Stan Z. National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing China Macau University of Science and Technology Macau China Institute of Deep Learning Baidu Research and National Engineering Laboratory for Deep Learning Technology and Application Universitat de Barcelona Computer Vision Center Barcelona Catalonia Instituto Nacional de Astrofsica Ptica y Electrnica Puebla72840 Mexico

Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face antispoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIASURF, which is the largest publicly available dataset for face antispoofing in terms of both subjects and modalities. Specifically, it consists of 1;000 subjects with 21;000 videos and each sample has 3 modalities (i.e., RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at http://***/***/chalearnfacespoofingattackdete/. Copyright © 2019, The Authors. All rights reserved.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and recognition — RRC-MLT-2019

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Te...

引用

International Conference on Document Analysis and recognition

作者： Nibal Nayef Yash Patel Michal Busta Pinaki Nath Chowdhury Dimosthenis Karatzas Wafa Khlif Jiri Matas Umapada Pal Jean-Christophe Burie Cheng-lin Liu Jean-Marc Ogier no affiliation The Robotics Institute Carnegie Mellon Universiry Pittsburgh USA Department of Cybernetics Czech Technical University Prague Czech Republic CVPR unit Indian Statistical Institute India Computer Vision Center Universitat Autònoma de Barcelona Spain L3i Laboratory University of La Rochelle France National Laboratory of Pattern Recognition Institute of Automation of Chinese Academy of Sciences China

With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

关键词： Task analysis Text recognition Training Benchmark testing Protocols Rendering (computer graphics)

来源：评论

学校读者我要写书评

暂无评论

Affine Non-negative Collaborative Representation Based pattern Classification

arXiv

引用

arXiv 2020年

作者： Yin, He-Feng Wu, Xiao-Jun Feng, Zhen-Hua Kittler, Josef School of Artificial Intelligence and Computer Science Jiangnan University Wuxi214122 China Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence Jiangnan University Wuxi214122 China Department of Computer Science University of Surrey GuildfordGU2 7XH United Kingdom Centre for Vision Speech and Signal Processing University of Surrey GuildfordGU2 7XH United Kingdom

—During the past decade, representation-based classification methods have received considerable attention in pattern recognition. In particular, the recently proposed non-negative representation based classification (NRC) method has been reported to achieve promising results in a wide range of classification tasks. However, NRC has two major drawbacks. First, there is no regularization term in the formulation of NRC, which may result in unstable solution and misclassification. Second, NRC ignores the fact that data usually lies in a union of multiple affine subspaces, rather than linear subspaces in practical applications. To address the above issues, this paper presents an affine non-negative collaborative representation (ANCR) model for pattern classification. To be more specific, ANCR imposes a regularization term on the coding vector. Moreover, ANCR introduces an affine constraint to better represent the data from affine subspaces. The experimental results on several benchmarking datasets demonstrate the merits of the proposed ANCR method. The source code of our ANCR is publicly available at https://***/yinhefeng/ANCR. Copyright © 2020, The Authors. All rights reserved.

关键词： Vectors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：