检索结果-内蒙古大学图书馆

MA-FSAR: Multimodal Adaptation of CLIP for Few-Shot Action Recognition

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Xing, Jiazheng Xu, Chao Wang, Mengmeng Dai, Guang Sun, Baigui Liu, Yong Wang, Jingdong Zhao, Jian Laboratory of Advanced Perception on Robotics and Intelligent Learning College of Control Science and Engineering Zhejiang University Zhejiang Hangzhou310027 China Alibaba Group China College of Computer Science and Technology Zhejiang University of Technology China SGIT AI Lab State Grid Corporation of China China Baidu China China Northwestern Polytechnical University Shanxi Xi’an China

Applying large-scale vision-language pre-trained models like CLIP to few-shot action recognition (FSAR) can significantly enhance both performance and efficiency. While several studies have recognized this advantage, most of them resort to full-parameter fine-tuning to make CLIP’s visual encoder adapt to the FSAR data, which not only costs high computations but also overlooks the potential of the visual encoder to engage in temporal modeling and focus on targeted semantics directly. To tackle these issues, we introduce MA-FSAR, a framework that employs the Parameter-Efficient Fine-Tuning (PEFT) technique to enhance the CLIP visual encoder in terms of action-related temporal and semantic representations. Our solution involves a Fine-grained Multimodal Adaptation, which is different from the previous attempts of PEFT in regular action recognition. Specifically, we first insert a Global Temporal Adaptation that only receives the class token to capture global motion cues efficiently. Then these outputs integrate with visual tokens to enhance local temporal dynamics by a Local Multimodal Adaptation, which incorporates text features unique to the FSAR support set branch to highlight fine-grained semantics related to actions. In addition to these token-level designs, we propose a prototype-level text-guided construction module to further enrich the temporal and semantic characteristics of video prototypes. Extensive experiments demonstrate our superior performance in various tasks using minor trainable parameters. © 2023, CC BY-SA.

关键词： Semantics

Omni-frequency Channel-selection Representations for Unsupervised Anomaly Detection

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Liang, Yufei Zhang, Jiangning Zhao, Shiwei Wu, Runze Liu, Yong Pan, Shuwen The Laboratory of Advanced Perception on Robotics and Intelligent Learning College of Control Science and Enginneering Zhejiang University Hangzhou310027 China The Fuxi AI Lab NetEase Games Hangzhou310012 China The Discipline of Control Science and Engineering School of Information and Electrical Engineering Zhejiang University City College Hangzhou310015 China

Density-based and classification-based methods have ruled unsupervised anomaly detection in recent years, while reconstruction-based methods are rarely mentioned for the poor reconstruction ability and low performance. However, the latter requires no costly extra training samples for the unsupervised training that is more practical, so this paper focuses on improving reconstruction-based method and proposes a novel Omnifrequency Channel-selection Reconstruction (OCR-GAN) network to handle sensory anomaly detection task in a perspective of frequency. Concretely, we propose a Frequency Decoupling (FD) module to decouple the input image into different frequency components and model the reconstruction process as a combination of parallel omni-frequency image restorations, as we observe a significant difference in the frequency distribution of normal and abnormal images. Given the correlation among multiple frequencies, we further propose a Channel Selection (CS) module that performs frequency interaction among different encoders by adaptively selecting different channels. Abundant experiments demonstrate the effectiveness and superiority of our approach over different kinds of methods, e.g., achieving a new state-of-the-art 98.3 detection AUC on the MVTec AD dataset without extra training data that markedly surpasses the reconstruction-based baseline by +38.1↑ and the current SOTA by +0.3↑. Copyright © 2022, The Authors. All rights reserved.

关键词： Anomaly detection

Distortion-Disentangled Contrastive learning

学校读者我要写书评

暂无评论

Distortion-Disentangled Contrastive Learning

IEEE Workshop on Applications of Computer Vision (WACV)

作者： Jinfeng Wang Sifan Song Jionglong Su S. Kevin Zhou School of AIAC Xi’an Jiaotong-Liverpool University Suzhou China School of BME & Suzhou Institute for Advanced Research Center for Medical Imaging Robotics Analytic Computing & Learning (MIRACLE) University of Science and Technology of China Suzhou China Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing China

Self-supervised learning is well known for its remarkable performance in representation learning and various downstream computer vision tasks. Recently, Positive-pair-Only Contrastive learning (POCL) has achieved reliable performance without the need to construct positive-negative training sets. It reduces memory requirements by lessening the dependency on the batch size. The POCL method typically uses a single objective function to extract the distortion invariant representation (DIR) which describes the proximity of positive-pair representations affected by different distortions. This objective function implicitly enables the model to filter out or ignore the distortion variant representation (DVR) affected by different distortions. However, some recent studies have shown that proper use of DVR in contrastive can optimize the performance of models in some downstream domain-specific tasks. In addition, these POCL methods have been observed to be sensitive to augmentation strategies. To address these limitations, we propose a novel POCL framework named Distortion-Disentangled Contrastive learning (DDCL) and a Distortion-Disentangled Loss (DDL). Our approach is the first to explicitly and adaptively disentangle and exploit the DVR inside the model and feature stream to improve the representation utilization efficiency, robustness and representation ability. Experiments demonstrate our framework’s superiority to Barlow Twins and Simsiam in terms of convergence, representation quality (including transferability and generalization), and robustness on several datasets.

关键词：

learning Intra-group Cooperation in Multi-agent Systems

学校读者我要写书评

暂无评论

Learning Intra-group Cooperation in Multi-agent Systems

International Conference on Mechatronics and Machine Vision in Practice (M2VIP)

作者： Weiwei Liu Shanqi Liu Jian Yang Yong Liu The Advanced Perception on Robotics and Intelligent Learning Lab College of Control Science and Enginneering Zhejiang University Hangzhou China China Research and Development Academy of Machinery Equipment Beijing China Huzhou Institute of Zhejiang University Huzhou China

ISBN: (纸本)9781665431545

Reinforcement learning is one of the algorithms used in multi-agent systems to promote agent cooperation. However, most current multi-agent reinforcement learning algorithms improve the communication capabilities of agents for cooperation, but the overall communication is costly and even harmful due to bandwidth limitations. In addition, de-centralized execution cannot generate joint actions, which is not conducive to cooperation. Therefore, we proposed the Hierarchical Group Cooperation Network (HGCN). advanced strategy, Group Network (GroNet), learns to group all agents based on their state rather than their location. The Low-level strategy, Group Cooperation Network (GCoNet), is a method of centralized training and centralized execution within a group, which effectively promotes agent collaboration. Finally, we validated our method in various experiments.

关键词： Training Mechatronics Machine vision Collaboration Reinforcement learning Bandwidth Task analysis

Grapevine Winter Pruning Automation: On Potential Pruning Points Detection through 2D Plant Modeling using Grapevine Segmentation 11

学校读者我要写书评

暂无评论

Grapevine Winter Pruning Automation: On Potential Pruning Po...

11th IEEE Annual International Conference on CYBER Technology in Automation, Control, and intelligent Systems, CYBER 2021

作者： Fernandes, Miguel Scaldaferri, Antonello Fiameni, Giuseppe Teng, Tao Gatti, Matteo Poni, Stefano Semini, Claudio Caldwell, Darwin Chen, Fei Active Perception and Robot Interactive Learning Laboratory Istituto Italiano di Tecnologia Department of Advanced Robotics Genova16163 Italy Italy Università Cattolica Del Sacro Cuore Department of Sustainable Crop Production Piacenza29122 Italy Lab Istituto Italiano di Tecnologia Genova16163 Italy T-Stone Robotics Institute The Chinese University of Hong Kong Department of Mechanical and Automation Engineering Hong Kong

ISBN: (纸本)9781665425278

Grapevine winter pruning is a complex task, that requires skilled workers to execute it correctly. The complexity of this task is also the reason why it is time consuming. Considering that this operation takes about 80-120 hours/ha to be completed, and therefore is even more crucial in large-size vineyards, an automated system can help to speed up the process. To this end, this paper presents a novel multidisciplinary approach that tackles this challenging task by performing object segmentation on grapevine images, used to create a representative model of the grapevine plants. Second, a set of potential pruning points is generated from this plant representation. We will describe (a) a methodology for data acquisition and annotation, (b) a neural network fine-tuning for grapevine segmentation, (c) an image processing based method for creating the representative model of grapevines, starting from the inferred segmentation and (d) potential pruning points detection and localization, based on the plant model which is a simplification of the grapevine structure. With this approach, we are able to identify a significant set of potential pruning points on the canes, that can be used, with further selection, to derive the final set of the real pruning points. © 2021 IEEE.

关键词： Data acquisition

LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Cheng, Yuzhou Jiao, Jianhao Wang, Yue Kanoulas, Dimitrios Robot Perception and Learning Lab Intelligent Robotics Department of Computer Science University College London Gower Street LondonWC1E 6BT United Kingdom Zhejiang University Zhejiang Hangzhou China AI Centre Department of Computer Science University College London Gower Street LondonWC1E 6BT United Kingdom Archimedes/Athena RC Greece

Visual localization involves estimating a query image’s 6-DoF (degrees of freedom) camera pose, which is a fundamental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach’s SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions. © 2024, CC BY.

关键词： Gaussian distribution

LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Jiao, Jianhao He, Jinhao Liu, Changkun Aegidius, Sebastian Hu, Xiangcheng Braud, Tristan Kanoulas, Dimitrios The Robot Perception and Learning Lab Intelligent Robotics Department of Computer Science University College London Gower Street LondonWC1E 6BT United Kingdom Nansha District Guangzhou China The Department of Computer Science and Engineering HKUST Hong Kong The Department of Electronic an Computer Engineering HKUST Hong Kong The AI Centre Department of Computer Science University College London Gower Street LondonWC1E 6BT United Kingdom Archimedes/Athena RC Greece

This paper presents LiteVLoc, a hierarchical visual localization framework that uses a lightweight topo-metric map to represent the environment. The method consists of three sequential modules that estimate camera poses in a coarse-to-fine manner. Unlike mainstream approaches relying on detailed 3D representations, LiteVLoc reduces storage overhead by leveraging learning-based feature matching and geometric solvers for metric pose estimation. A novel dataset for the map-free relocalization task is also introduced. Extensive experiments including localization and navigation in both simulated and real-world scenarios have validate the system's performance and demonstrated its precision and efficiency for large-scale deployment. Code and data will be made publicly available at https://***/LiteVLoc. Copyright © 2024, The Authors. All rights reserved.

关键词： Visualization

Taming Stable Diffusion for MRI Cross-Modality Translation

学校读者我要写书评

暂无评论

Taming Stable Diffusion for MRI Cross-Modality Translation

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Yingtai Li Shuo Yang Xiaoyan Wu Shan He S. Kevin Zhou School of Biomedical Engineering Division of Life Sciences and Medicine USTC Hefei P.R. China Center for Medical Imaging Robotics Analytic Computing & Learning (MIRACLE) Suzhou Institute for Advanced Research USTC P.R. China iFLYTEK Research iFLYTEK Co. Ltd Hefei P.R. China Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing P.R. China

ISBN: (数字)9798350386226

ISBN: (纸本)9798350386233

In this study, we explore using Stable Diffusion (SD) for unsupervised medical image-to-image translation. SD has shown remarkable performances in generating high-quality images and can be easily applied to generate custom contents by injecting standard plug-ins like LoRA, offering a promising solution to tackle the complexity caused by variations in imaging modalities, acquisition parameters, and body parts in medical imaging. However, We empirically find that existing pipelines designed for natural images fail to translate directly to medical images due to weak structural control and inappropriate color preservation. To address these issues, we propose a novel two-branch image translation pipeline. This pipeline decouples the generation of target image along the time axis and employs ControlNet to ensure precise structural preservation. Additionally, we customize SD to generate images of extreme brightness, a common feature in medical imaging. Our results on the BraTS dataset demonstrate that SD with task-specific plug-ins can generate high-quality medical images comparable to those generated by task-specific models. Since the development of these standard plug-ins can be easily done by clinicians without much knowledge of the underlying algorithm, such a mode holds the potential to significantly extend the use of medical image computing algorithms in the clinical environment.

关键词： Translation Image color analysis Computational modeling Biological system modeling Magnetic resonance imaging Pipelines Brightness Complexity theory Standards Biomedical imaging

learning Dynamic-Objective Policies from a Class of Optimal Trajectories

学校读者我要写书评

暂无评论

Learning Dynamic-Objective Policies from a Class of Optimal ...

IEEE Conference on Decision and Control

作者： Christopher Iliffe Sprague Dario Izzo Petter Ögren Robotics Perception and Learning Lab. Royal Institute of Technology (KTH) Stockholm Sweden Advanced Concepts Team European Space Technology Center (ESTEC) Noordwijk The Netherlands

ISBN: (数字)9781728174471

ISBN: (纸本)9781728174488

Optimal state-feedback controllers, capable of changing between different objective functions, are advantageous to systems in which unexpected situations may arise. However, synthesising such controllers, even for a single objective, is a demanding process. In this paper, we present a novel and straightforward approach to synthesising these policies through a combination of trajectory optimisation, homotopy continuation, and imitation learning. We use numerical continuation to efficiently generate optimal demonstrations across several objectives and boundary conditions, and use these to train our policies. Additionally, we demonstrate the ability of our policies to effectively learn families of optimal state- feedback controllers, which can be used to change objective functions online. We illustrate this approach across two trajectory optimisation problems, an inverted pendulum swingup and a spacecraft orbit transfer, and show that the synthesised policies, when evaluated in simulation, produce trajectories that are near-optimal. These results indicate the benefit of trajectory optimisation and homotopy continuation to the synthesis of controllers in dynamic-objective contexts.

关键词： Trajectory Optimization Optimal control Linear programming Boundary conditions Robots Process control