检索结果-内蒙古大学图书馆

Focus on what matters: separated models for visual-based RL generalization 24

学校读者我要写书评

暂无评论

Focus on what matters: separated models for visual-based RL ...

Proceedings of the 38th International Conference on Neural Information Processing systems

作者： Di Zhang Bowen Lv Hai Zhang Feifan Yang Junqiao Zhao Hang Yu Chang Huang Hongtu Zhou Chen Ye Changjun Jiang Department of Computer Science Tongji University Shanghai China and MOE Key Lab of Embedded System and Service Computing Tongji University Shanghai China

ISBN: (纸本)9798331314385

关键词：

Focus On What Matters: Separated Models For Visual-Based RL Generalization

学校读者我要写书评

暂无评论

arXiv 2024年

A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (Separated Models for Generalization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent’s focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications. Source code is availab.e at https://***/r/SMG/. © 2024, CC BY.

关键词： 3D reconstruction

Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Zhang, Xiao Zhang, Hai Zhou, Hongtu Huang, Chang Zhang, Di Ye, Chen Zhao, Junqiao The Department of Computer Science and Technology Tongji University China The MOE Key Lab of Embedded System and Service Computing Tongji University China

Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks. To ensure safety during and after training process, existing methods tend to adopt overly conservative policy to avoid unsafe situations. However, overly conservative policy severely hinders the exploration, and makes the algorithms substantially less rewarding. In this paper, we propose a method to construct a boundary that discriminates safe and unsafe states. The boundary we construct is equivalent to distinguishing dead-end states, indicating the maximum extent to which safe exploration is guaranteed, and thus has minimum limitation on exploration. Similar to Recovery Reinforcement Learning, we utilize a decoupled RL framework to learn two policies, (1) a task policy that only considers improving the task performance, and (2) a recovery policy that maximizes safety. The recovery policy and a corresponding safety critic are pretrained on an offline dataset, in which the safety critic evaluates upper bound of safety in each state as awareness of environmental safety for the agent. During online training, a behavior correction mechanism is adopted, ensuring the agent to interact with the environment using safe actions only. Finally, experiments of continuous control tasks demonstrate that our approach has better task performance with less safety violations than state-of-the-art algorithms. Copyright © 2023, The Authors. All rights reserved.

关键词： Reinforcement learning

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Zhang, Hai Yu, Hang Zhao, Junqiao Zhang, Di Huang, Chang Zhou, Hongtu Zhang, Xiao Ye, Chen Department of Computer Science Tongji University Shanghai China MOE Key Lab of Embedded System and Service Computing Tongji University Shanghai China

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO2 (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks. © 2023, CC BY.

关键词： Benchmarking

How to fine-tune the model: unified model shift and model bias policy optimization 23

学校读者我要写书评

暂无评论

How to fine-tune the model: unified model shift and model bi...

Proceedings of the 37th International Conference on Neural Information Processing systems

作者： Hai Zhang Hang Yu Junqiao Zhao Di Zhang Chang Huang Hongtu Zhou Xiao Zhang Chen Ye Department of Computer Science Tongji University Shanghai China and MOE Key Lab of Embedded System and Service Computing Tongji University Shanghai China

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model over-fitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks. Code: https://***/betray12138/***

关键词：

Convex Hull-based Algebraic Constraint for Visual Quadric SLAM

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Yu, Xiaolong Zhao, Junqiao Song, Shuangfu Zhu, Zhongyang Yuan, Zihan Ye, Chen Feng, Tiantian School of Computer Science and Technology Tongji University Shanghai China The MOE Key Lab of Embedded System and Service Computing Tongji University Shanghai China Institute of Intelligent Vehicles Tongji University Shanghai China School of Surveying and Geo-Informatics Tongji University Shanghai China

Using Quadrics as the object representation has the benefits of both generality and closed-form projection derivation between image and world spaces. Although numerous constraints have been proposed for dual quadric reconstruction, we found that many of them are imprecise and provide minimal improvements to localization. After scrutinizing the existing constraints, we introduce a concise yet more precise convex hull-based algebraic constraint for object landmarks, which is applied to object reconstruction, frontend pose estimation, and backend bundle adjustment. This constraint is designed to fully leverage precise semantic segmentation, effectively mitigating mismatches between complex-shaped object contours and dual quadrics. Experiments on public datasets demonstrate that our approach is applicable to both monocular and RGB-D SLAM and achieves improved object mapping and localization than existing quadric SLAM methods. The implementation of our method is availab.e at https://***/tievtongji/convexhull-based-algebraic-constraint. © 2025, CC BY.

关键词： Semantic Segmentation

MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gu, Qiwen Wang, Xufei Zhang, Fenglin Zhao, Junqiao Tao, Siyue Ye, Chen Feng, Tiantian Jiang, Changjun Department of Computer Science and Technology School of Electronics and Information Engineering Tongji University Shanghai China The Shanghai Research Institute for Intelligent Autonomous System Tongji University Shanghai China The MOE Key Lab of Embedded System and Service Computing Tongji University Shanghai China School of Surveying and Geo-Informatics Tongji University Shanghai China

Visual Place Recognition (VPR) aims to robustly identify locations by leveraging image retrieval based on descriptors encoded from environmental images. However, drastic appearance changes of images captured from different viewpoints at the same location pose incoherent supervision signals for descriptor learning, which severely hinder the performance of VPR. Previous work proposes classifying images based on manually defined rules or ground truth lab.ls for viewpoints, followed by descriptor training based on the classification results. However, not all datasets have ground truth lab.ls of viewpoints and manually defined rules may be suboptimal, leading to degraded descriptor performance. To address these challenges, we introduce the mutual learning of viewpoint self-classification and VPR. Starting from coarse classification based on geographical coordinates, we progress to finer classification of viewpoints using simple clustering techniques. The dataset is partitioned in an unsupervised manner while simultaneously training a descriptor extractor for place recognition. Experimental results show that this approach almost perfectly partitions the dataset based on viewpoints, thus achieving mutually reinforcing effects. Our method even excels state-of-the-art (SOTA) methods that partition datasets using ground truth lab.ls. Copyright © 2024, The Authors. All rights reserved.

关键词： Self-supervised learning

LOG-LIO2: A LiDAR-Inertial Odometry with Efficient Uncertainty Analysis

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Huang, Kai Zhao, Junqiao Lin, Jiaye Zhu, Zhongyang Song, Shuangfu Ye, Chen Feng, Tiantian The School of Surveying and Geo-Informatics Tongji University Shanghai China Department of Computer Science and Technology School of Electronics and Information Engineering Tongji University Shanghai China The MOE Key Lab of Embedded System and Service Computing Tongji University Shanghai China Institute of Intelligent Vehicles Tongji University Shanghai China

Uncertainty in LiDAR measurements, stemming from factors such as range sensing, is crucial for LIO (LiDAR-Inertial Odometry) systems as it affects the accurate weighting in the loss function. While recent LIO systems address uncertainty related to range sensing, the impact of incident angle on uncertainty is often overlooked by the community. Moreover, the existing uncertainty propagation methods suffer from computational inefficiency. This paper proposes a comprehensive point uncertainty model that accounts for both the uncertainties from LiDAR measurements and surface characteristics, along with an efficient local uncertainty analytical method for LiDAR-based state estimation problem. We employ a projection operator that separates the uncertainty into the ray direction and its orthogonal plane. Then, we derive incremental Jacobian matrices of eigenvalues and eigenvectors w.r.t. points, which enables a fast approximation of uncertainty propagation. This approach eliminates the requirement for redundant traversal of points, significantly reducing the time complexity of uncertainty propagation from O(n) to O(1) when a new point is added. Simulations and experiments on public datasets are conducted to validate the accuracy and efficiency of our formulations. The proposed methods have been integrated into a LIO system, which is availab.e at https://***/tiev-tongji/LOG-LIO2. Copyright © 2024, The Authors. All rights reserved.

关键词： Optical radar

Meta Balanced Network for Fair Face Recognition

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Wang, Mei Zhang, Yaobin Deng, Weihong The Pattern Recognition and Intelligent System Lab. School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Key Lab. of Trustworthy Distributed Computing and Service Ministry of Education Beijing University of Posts and Telecommunications Beijing China

Although deep face recognition has achieved impressive progress in recent years, controversy has arisen regarding discrimination based on skin tone, questioning their deployment into real-world scenarios. In this paper, we aim to systematically and scientifically study this bias from both data and algorithm aspects. First, using the dermatologist approved Fitzpatrick Skin Type classification system and Individual Typology Angle, we contribute a benchmark called Identity Shades (IDS) database, which effectively quantifies the degree of the bias with respect to skin tone in existing face recognition algorithms and commercial APIs. Further, we provide two skin-tone aware training datasets, called BUPT-Globalface dataset and BUPT-Balancedface dataset, to remove bias in training data. Finally, to mitigate the algorithmic bias, we propose a novel meta-learning algorithm, called Meta Balanced Network (MBN), which learns adaptive margins in large margin loss such that the model optimized by this loss can perform fairly across people with different skin tones. To determine the margins, our method optimizes a meta skewness loss on a clean and unbiased meta set and utilizes backward-on-backward automatic differentiation to perform a second order gradient descent step on the current margins. Extensive experiments show that MBN successfully mitigates bias and learns more balanced performance for people with different skin tones in face recognition. The proposed datasets are availab.e at http://***/RFW/***. © 2022, CC BY.

关键词： Face recognition