检索结果-内蒙古大学图书馆

IEEE Systems Journal 2025年第1期19卷 317-326页

作者： Liu, Weiwei Hu, Wenxuan Jing, Wei Lei, Lanxin Gao, Lingping Liu, Yong Zhejiang University The Advanced Perception on Robotics and Intelligent Learning Lab College of Control Science and Engineering Hangzhou310027 China Huzhou Institute of Zhejiang University Zhejiang 310027 China Alibaba DAMO Academy Autonomous Driving Lab Zhejiang 311121 China Huzhou University College of Information Engineering Zhejiang 313000 China

Autonomous vehicles trained through multiagent reinforcement learning (MARL) have shown impressive results in many driving scenarios. However, the performance of these trained policies can be impacted when faced with diverse driving styles and personalities, particularly in highly interactive situations. This is because conventional MARL algorithms usually operate under the assumption of fully cooperative behavior among all agents and focus on maximizing team rewards during training. To address this issue, we introduce the personality modeling network (PeMN), which includes a cooperation value function and personality parameters to model the varied interactions in high-interactive scenarios. The PeMN also enables the training of a background traffic flow with diverse behaviors, thereby improving the performance and generalization of the ego vehicle. Our extensive experimental studies, which incorporate different personality parameters in high-interactive driving scenarios, demonstrate that the personality parameters effectively model diverse driving styles and that policies trained with PeMN demonstrate better generalization than traditional MARL methods. © 2007-2012 IEEE.

关键词： Reinforcement learning

learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement learning

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Weiwei, Liu Wenxuan, Hu Wei, Jing Lanxin, Lei Lingping, Gao Yong, Liu The Advanced Perception on Robotics and Intelligent Learning Lab College of Control Science and Enginneering Zhejiang University Hangzhou310027 China The Advanced Perception on Robotics and Intelligent Learning Lab Huzhou Institute Zhejiang University Huzhou China College of Information Engineering Huzhou University Huzhou China Department of Autonomous Driving Lab Alibaba DAMO Academy Hangzhou China

Autonomous vehicles trained through Multi-Agent Reinforcement learning (MARL) have shown impressive results in many driving scenarios. However, the performance of these trained policies can be impacted when faced with diverse driving styles and personalities, particularly in highly interactive situations. This is because conventional MARL algorithms usually operate under the assumption of fully cooperative behavior among all agents and focus on maximizing team rewards during training. To address this issue, we introduce the Personality Modeling Network (PeMN), which includes a cooperation value function and personality parameters to model the varied interactions in high-interactive scenarios. The PeMN also enables the training of a background traffic flow with diverse behaviors, thereby improving the performance and generalization of the ego vehicle. Our extensive experimental studies, which incorporate different personality parameters in high-interactive driving scenarios, demonstrate that the personality parameters effectively model diverse driving styles and that policies trained with PeMN demonstrate better generalization compared to traditional MARL methods. Copyright © 2024, The Authors. All rights reserved.

关键词： Reinforcement learning

FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Rotondi, Dennis Scaparro, Fabio Blum, Hermann Arras, Kai O. Socially Intelligent Robotics Lab Institute for Artificial Intelligence University of Stuttgart Germany Robot Perception and Learning Lab LAMARR Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

The concept of 3D scene graphs is increasingly recognized as a powerful semantic and hierarchical representation of the environment. Current approaches often address this at a coarse, object-level resolution. In contrast, our goal is to develop a representation that enables robots to directly interact with their environment by identifying both the location of functional interactive elements and how these can be used. To achieve this, we focus on detecting and storing objects at a finer resolution, focusing on affordance-relevant parts. The primary challenge lies in the scarcity of data that extends beyond instance-level detection and the inherent difficulty of capturing detailed object features using robotic sensors. We leverage currently available 3D resources to generate 2D data and train a detector, which is then used to augment the standard 3D scene graph generation pipeline. Through our experiments, we demonstrate that our approach achieves functional element segmentation comparable to state-of-the-art 3D models and that our augmentation enables task-driven affordance grounding with higher accuracy than the current solutions. © 2025, CC BY-NC-ND.

关键词： Semantics

Raising Body Ownership in End-to-End Visuomotor Policy learning via Robot-Centric Pooling

学校读者我要写书评

暂无评论

Raising Body Ownership in End-to-End Visuomotor Policy Learn...

IEEE/RSJ International Conference on intelligent Robots and Systems (IROS)

作者： Zheyu Zhuang Ville Kyrki Danica Kragic Robotics Perception and Learning Lab EECS KTH Royal Institute of Technology Stockholm Sweden Department of Electrical Engineering and Automation (EEA) Intelligent Robotics Group Aalto University Espoo Finland

ISBN: (数字)9798350377705

ISBN: (纸本)9798350377712

We present Robot-centric Pooling (RcP), a novel pooling method designed to enhance end-to-end visuomo-tor policies by enabling differentiation between the robots and similar entities or their surroundings. Given an image-proprioception pair, RcP guides the aggregation of image features by highlighting image regions correlating with the robot’s proprioceptive states, thereby extracting robot-centric image representations for policy learning. Leveraging contrastive learning techniques, RcP integrates seamlessly with existing visuomotor policy learning frameworks and is trained jointly with the policy using the same dataset, requiring no extra data collection involving self-distractors. We evaluate the proposed method with reaching tasks in both simulated and real-world settings. The results demonstrate that RcP significantly enhances the policies’ robustness against various unseen distractors, including self-distractors, positioned at different locations. Additionally, the inherent robot-centric characteristic of RcP enables the learnt policy to be far more resilient to aggressive pixel shifts compared to the baselines. Code available at: https://***/Zheyu-Zhuang/RcP

关键词： Propioception Performance gain Image representation Data collection Feature extraction Robustness History Robots intelligent robots Resilience

A Data-Driven Method for Estimating Formation Flexibility in Beyond-Visual-Range Air Combat

学校读者我要写书评

暂无评论

A Data-Driven Method for Estimating Formation Flexibility in...

International Conference on Unmanned Aircraft Systems (ICUAS)

作者： Edvards Scukins Andre N. Costa Petter Ögren Aeronautical Solutions Division SAAB Aeronautics Robotics Perception and Learning Lab. Royal Institute of Technology (KTH) Decision Support Systems Subdivision Institute for Advanced Studies (IEAv)

ISBN: (数字)9798350357882

ISBN: (纸本)9798350357899

Tactical decisions in air combat are typically evaluated using experience as a basis. Pilots undergo frequent training in various air combat processes to enhance their combat proficiency and evaluation skills. Having the Situational Awareness (SA) necessary to evaluate the effects of multiple missile threats can often be challenging. This study provides a new method for calculating an aircraft fleet's maneuver flexibility in a Beyond-Visual-Range (BVR) setting. Sustaining a high degree of flexibility is necessary to adapt to unforeseen circumstances in BVR air combat. To do that, we employ Deep Neural Networks (DNN) to capture the result of a high-performance aircraft model in the presence of adversarial BVR missiles. We then modify our approach to calculate the aircraft's maneuverability concerning an opposing fleet, looking at the advantages and disadvantages of several flight formations. Finally, we consider the anticipated threat from an incoming opponent formation and optimize the counter-formation. This methodology offers a more sophisticated comprehension of aircraft maneuver flexibility within a BVR framework and aids in developing flexible and efficient decision-making techniques for air combat.

关键词： Training Measurement Missiles Visualization Atmospheric modeling Decision making Artificial neural networks

Visual Object Tracking across Diverse Data Modalities: A Review

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wang, Mengmeng Ma, Teli Xin, Shuo Hou, Xiaojun Xing, Jiazheng Dai, Guang Wang, Jingdong Liu, Yong The Laboratory of Advanced Perception on Robotics and Intelligent Learning College of Control Science and Engineering Zhejiang University Zhejiang Hangzhou310027 China State Grid Shanxi Electric Power Company Limited China Baidu China

Visual Object Tracking (VOT) is an attractive and significant research area in computer vision, which aims to recognize and track specific targets in video sequences where the target objects are arbitrary and class-agnostic. The VOT technology could be applied in various scenarios, processing data of diverse modalities such as RGB, thermal infrared and point cloud. Besides, since no one sensor could handle all the dynamic and varying environments, multi-modal VOT is also investigated. This paper presents a comprehensive survey of the recent progress of both single-modal and multi-modal VOT, especially the deep learning methods. Specifically, we first review three types of mainstream single-modal VOT, including RGB, thermal infrared and point cloud tracking. In particular, we conclude four widely-used single-modal frameworks, abstracting their schemas and categorizing the existing inheritors. Then we summarize four kinds of multi-modal VOT, including RGB-Depth, RGB-Thermal, RGB-LiDAR and RGB-Language. Moreover, the comparison results in plenty of VOT benchmarks of the discussed modalities are presented. Finally, we provide recommendations and insightful observations, inspiring the future development of this fast-growing literature. © 2024, CC BY.

关键词： Data handling

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Xing, Jiazheng Xu, Chao Qian, Yijie Liu, Yang Dai, Guang Sun, Baigui Liu, Yong Wang, Jingdong Laboratory of Advanced Perception on Robotics and Intelligent Learning College of Control Science and Engineering Zhejiang University Zhejiang Hangzhou310027 China Alibaba Group China SGIT AI Lab State Grid Shaanxi Electric Power Company China Baidu Inc China

Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the widespread applications. In this work, we propose an effective and efficient framework, termed TryOn-Adapter. Specifically, we first decouple clothing identity into fine-grained factors: style for color and category information, texture for high-frequency details, and structure for smooth spatial adaptive transformation. Our approach utilizes a pre-trained exemplar-based diffusion model as the fundamental network, whose parameters are frozen except for the attention layers. We then customize three lightweight modules (Style Preserving, Texture Highlighting, and Structure Adapting) incorporated with fine-tuning techniques to enable precise and efficient identity control. Meanwhile, we introduce the training-free T-RePaint strategy to further enhance clothing identity preservation while maintaining the realistic try-on effect during the inference. Our experiments demonstrate that our approach achieves state-of-the-art performance on two widely-used benchmarks. Additionally, compared with recent full-tuning diffusion-based methods, we only use about half of their tunable parameters during training. The code will be made publicly available at https://***/jiazhengxing/TryOn-Adapter. © 2024, CC BY-NC-ND.

关键词： Textures

learning to Localize Cross-Anatomy Landmarks in X-Ray Images with a Universal Model

学校读者我要写书评

暂无评论

Biomedical Engineering Frontiers 2022年第1期3卷 298-308页

作者： Heqin Zhu Qingsong Yao Li Xiao S.Kevin Zhou Key Lab of Intelligent Information Processing of Chinese Academy of Sciences(CAS) Institute of Computing TechnologyCASBeijing 100190China Center for Medical Imaging RoboticsAnalytic Computing&Learning(MIRACLE)School of Biomedical Engineering&Suzhou Institute for Advanced ResearchUniversity of Science and Technology of ChinaSuzhou 215123China

Objective and Impact *** this work,we develop a universal anatomical landmark detection model which learns once from multiple datasets corresponding to different anatomical *** with the conventional model trained on a single dataset,this universal model not only is more light weighted and easier to train but also improves the accuracy of the anatomical landmark *** accurate and automatic localization of anatomical landmarks plays an essential role in medical image ***,recent deep learning-based methods only utilize limited data from a single *** is promising and desirable to build a model learned from different regions which harnesses the power of big *** model consists of a local network and a global network,which capture local features and global features,*** local network is a fully convolutional network built up with depth-wise separable convolutions,and the global network uses dilated convolution to enlarge the receptive field to model global *** evaluate our model on four 2D X-ray image datasets totaling 1710 images and 72 landmarks in four anatomical *** experimental results show that our model improves the detection accuracy compared to the state-of-the-art *** model makes the first attempt to train a single network on multiple datasets for landmark *** results qualitatively and quantitatively show that our proposed model performs better than other models trained on multiple datasets and even better than models trained on a single dataset separately.

关键词： convolution utilize separable

Probabilistic Spiking Neural Network for Robotic Tactile Continual learning

学校读者我要写书评

暂无评论

Probabilistic Spiking Neural Network for Robotic Tactile Con...

IEEE International Conference on robotics and Automation (ICRA)

作者： Senlin Fang Yiwen Liu Chengliang Liu Jingnan Wang Yuanzhe Su Yupo Zhang Hoiio Kong Zhengkun Yi Xinyu Wu City University of Macau Macau China The Department of Intelligent Systems and Robot Learning Lab ISRL Group SIAT Branch Institute of Artificial Intelligence and Robotics for Society Shenzhen Institute of Advanced Technology Shenzhen

ISBN: (数字)9798350384574

ISBN: (纸本)9798350384581

The sense of touch is essential for robots to perform various daily tasks. Artificial Neural Networks have shown significant promise in advancing robotic tactile learning. However, due to the changing of tactile data distribution as robots encounter new tasks, ANN-based robotic tactile learning suffers from catastrophic forgetting. To solve this problem, we introduce a novel continual learning (CL) framework called the Probabilistic Spiking Neural Network with Variational Continual learning (PSNN-VCL). In this framework, PSNN introduces uncertainty during spike emission and can apply fast Variational Inference by optimizing the uncertainty through backpropagation, which significantly reduces the required model parameters for VCL. We establish a robotic tactile CL benchmark using publicly available datasets to evaluate our method. Experimental results demonstrated that, compared to other CL methods, PSNN-VCL not only achieves superior performance in terms of widely used CL metrics but also achieves at least a 50% reduction in model parameters on the robotic tactile CL benchmark.

关键词： Continuing education Measurement Uncertainty Spiking neural networks learning (artificial intelligence) Benchmark testing Robot sensing systems

TransVOS: Video object segmentation with transformers

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Mei, Jianbiao Wang, Mengmeng Lin, Yeneng Yuan, Yi Liu, Yong Laboratory of Advanced Perception on Robotics and Intelligent Learning College of Control Science and Engineering Zhejiang University NetEase Fuxi AI Lab

Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS). A crucial problem in this task is how to model the dependency both among different frames and inside every frame. However, most of these methods neglect the spatial relationships (inside each frame) and do not make full use of the temporal relationships (among different frames). In this paper, we propose a new transformer-based framework, termed TransVOS, introducing a vision transformer to fully exploit and model both the temporal and spatial relationships. Moreover, most STM-based approaches employ two separate encoders to extract features of two significant inputs, i.e., reference sets (history frames with predicted masks) and query frame (current frame), respectively, increasing the models' parameters and complexity. To slim the popular two-encoder pipeline while keeping the effectiveness, we design a single two-path feature extractor to encode the above two inputs in a unified way. Extensive experiments demonstrate the superiority of our TransVOS over state-of-the-art methods on both DAVIS and YouTube-VOS datasets. Codes are available at https://***/sallymmx/***. Copyright © 2021, The Authors. All rights reserved.

关键词： Signal encoding