检索结果-内蒙古大学图书馆

Semi-supervised imitation learning with mixed qualities of demonstrations for autonomous driving

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Lee, Gunmin Oh, Wooseok Shin, Seungyoun Kim, Dohyeong Oh, Jeongwoo Jeong, Jaeyeon Choi, Sungjoon Oh, Songhwai The Department of Electrical and Computer Engineering ASRI Seoul National University Seoul08826 Korea Republic of The School of Computer Science and Engineering Dongguk University Seoul04620 Korea Republic of The Department of Artificial Intelligence Korea University Seoul02841 Korea Republic of

In this paper, we consider the problem of autonomous driving using imitation learning in a semi-supervised manner. In particular, both labeled and unlabeled demonstrations are leveraged during training by estimating the quality of each unlabeled demonstration. If the provided demonstrations are corrupted and have a low signal-to-noise ratio, the performance of the imitation learning agent can be degraded significantly. To mitigate this problem, we propose a method called semi-supervised imitation learning (SSIL). SSIL first learns how to discriminate and evaluate each state-action pair's reliability in unlabeled demonstrations by assigning higher reliability values to demonstrations similar to labeled expert demonstrations. This reliability value is called leverage. After this discrimination process, both labeled and unlabeled demonstrations with estimated leverage values are utilized while training the policy in a semi-supervised manner. The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities. Moreover, the hardware experiments using an RC car are conducted to show that the proposed method can be applied to real-world applications. Copyright © 2021, The Authors. All rights reserved.

关键词： Demonstrations

Sparse Actor-Critic: Sparse Tsallis Entropy Regularized Reinforcement Learning in a Continuous Action Space

学校读者我要写书评

暂无评论

Sparse Actor-Critic: Sparse Tsallis Entropy Regularized Rein...

International Conference on Ubiquitous Robots and Ambient Intelligence (URAI)

作者： Jaegoo Choy Kyungjae Lee Songhwai Oh Department of Electrical and Computer Engineering and ASRI Seoul National University Seoul Korea

ISBN: (数字)9781728157153

ISBN: (纸本)9781728157160

In case of deep reinforcement learning (RL) algorithms, to achieve high performance in complex continuous control tasks, it is necessary to exploit the goal and at the same time explore the environment. In this paper, we introduce a novel off-policy actor-critic reinforcement learning algorithm with a sparse Tsallis entropy regularizer. The sparse Tsallis entropy regularizer has the effect of maximizing the expected returns while maximizing the sparse Tsallis entropy for its policy function. Maximizing the sparse Tsallis entropy makes the actor to explore the large action and state space efficiently, thus it helps us to find the optimal action at each state. We derive the iteration update rules and modify a policy iteration rule for an off-policy method. In experiments, we demonstrate the effectiveness of the proposed method in continuous reinforcement learning problems in terms of the convergence speed. The proposed method outperforms former on-policy and off-policy RL algorithms in terms of the convergence speed and performance.

关键词： Entropy Robots Aerospace electronics Collision avoidance Stochastic processes Learning (artificial intelligence) Machine learning

Decentralized Design and Plug-and-Play Distributed Control for Linear Multi-Channel Systems

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Kim, Taekyoo Lee, Donggil Shim, Hyungbo ASRI Department of Electrical and Computer Engineering Seoul National University Korea Republic of

We propose a distributed control, in which many identical control agents are deployed for controlling a linear time-invariant plant that has multiple input-output channels. Each control agent can join or leave the control loop during the operation of stabilization without particular initialization over the whole networked agents. Once new control agents join the loop, they self-organize their control dynamics, which does not interfere the control by other active agents, which is achieved by local communication with the neighboring agents. The key idea enabling these features is the use of Bass' algorithm, which allows the distributed computation of stabilizing gains by solving a Lyapunov equation in a distributed manner. © 2020, CC BY.

关键词： Lyapunov functions

Inverse Optimal Control from Demonstrations with Mixed Qualities

学校读者我要写书评

暂无评论

Inverse Optimal Control from Demonstrations with Mixed Quali...

International Conference on Ubiquitous Robots and Ambient Intelligence (URAI)

作者： Kyungjae Lee Yunho Choi Songhwai Oh Department of Electrical and Computer Engineering and ASRI Seoul National University Seoul Korea

ISBN: (数字)9781728157153

ISBN: (纸本)9781728157160

This paper proposes an inverse optimal control (IOC) framework which incorporates demonstrations with mixed qualities. The proposed method utilizes the benefits of sub-optimal demonstrations which can provide information about what not to do and supplies training data near states unvisited by optimal demonstrations. The main idea of the proposed method is to find the value function which satisfies the optimality condition over optimal demonstrations and violates it over sub-optimal demonstrations. We conduct experiments on three environments and empirically show that the proposed method outperforms the original IOC algorithm, which uses only optimal demonstrations.

关键词： Optimal control Kernel Mathematical model Algorithms Aerospace electronics Robots

Guaranteeing almost fault-free tracking performance from transient to steady-state:a disturbance observer approach

学校读者我要写书评

暂无评论

Science China(Information Sciences) 2018年第7期61卷 227-243页

作者： Gyunghoon PARK Hyungbo SHIM ASRI Department of Electrical and Computer Engineering Seoul National University

In this paper, we propose an output-feedback fault-tolerant controller(FTC) for a class of uncertain multi-input single-output systems under float and lock-in-place actuator faults. Of particular interest is to recover a fault-free tracking performance of a(pre-defined) nominal closed-loop system, during the entire time period including the transients due to abrupt actuator faults. As a key component, a highgain disturbance observer(DOB) is employed to rapidly compensate the lumped disturbance, a compressed expression of all the effect of actuator faults(as well as model uncertainty and disturbance) on the *** implement this high-gain approach, a fixed control allocation(CA) law is presented in order to keep an extended system with a virtual scalar input to remain of minimum phase under any patterns of faults. It is shown via the singular perturbation theory that the proposed FTC, consisting of the high-gain DOB and the CA law, resolves the problem in an approximate but arbitrarily accurate sense. Simulations with the linearized lateral model of Boeing 747 are performed to verify the validity of the proposed FTC scheme.

关键词： disturbance observer fault-tolerant control robust control performance recovery actuator fault

Operation-Aware Soft Channel Pruning using Differentiable Masks

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Kang, Minsoo Han, Bohyung Computer Vision Laboratory Department of Electrical and Computer Engineering & ASRI Seoul National University Korea Republic of

We propose a simple but effective data-driven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. The proposed approach makes a joint consideration of batch normalization (BN) and rectified linear unit (ReLU) for channel pruning;it estimates how likely the two successive operations deactivate each feature map and prunes the channels with high probabilities. To this end, we learn differentiable masks for individual channels and make soft decisions throughout the optimization procedure, which facilitates to explore larger search space and train more stable networks. The proposed framework enables us to identify compressed models via a joint learning of model parameters and channel pruning without an extra procedure of fine-tuning. We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks given the same amount of resources when compared with the state-of-the-art methods. Copyright © 2020, The Authors. All rights reserved.

关键词： Deep neural networks

State Estimation and Tracking Control for Hybrid Systems by Gluing the Domains

学校读者我要写书评

暂无评论

State Estimation and Tracking Control for Hybrid Systems by ...

作者： Kim, Jisu Shim, Hyungbo Seo, Jin Heon ASRI Department of Electrical and Computer Engineering Seoul National University Seoul08826 Korea Republic of

We study the design problems of state observers and tracking controllers for a class of hybrid systems whose state jumps. The idea is to utilize the well-known method of gluing the jump set (a part of domain where the jumps take place) onto its image, which converts the hybrid system into a continuous-Time system whose state does not jump. Sufficient conditions for this idea to be implemented are listed and discussed with a few concrete examples. In particular, we present a structural condition for an observer design, and, for tracking control, we introduce a feedback to compensate residual discontinuity in the vector field after gluing. The benefits of the proposed approach include that the observer design does not require detection of the state jumps, and that the tracking control does not require the plant state jumps when the reference jumps. © 1963-2012 IEEE.

关键词： Hybrid systems

Minimax control of ambiguous linear stochastic systems using the Wasserstein metric

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Kim, Kihyun Yang, Insoon Department of Electrical and Computer Engineering ASRI Seoul National University Seoul08826 Korea Republic of

In this paper, we propose a minimax linear-quadratic control method to address the issue of inaccurate distribution information in practical stochastic systems. To construct a control policy that is robust against errors in an empirical distribution of uncertainty, our method is to adopt an adversary, which selects the worst-case distribution. To systematically adjust the conservativeness of our method, the opponent receives a penalty proportional to the amount, measured with the Wasserstein metric, of deviation from the empirical distribution. In the finite-horizon case, using a Riccati equation, we derive a closed-form expression of the unique optimal policy and the opponent’s policy that generates the worst-case distribution. This result is then extended to the infinite-horizon setting by identifying conditions under which the Riccati recursion converges to the unique positive semi-definite solution to an associated algebraic Riccati equation (ARE). The resulting optimal policy is shown to stabilize the expected value of the system state under the worst-case distribution. We also discuss that our method can be interpreted as a distributional generalization of the H∞-method. Copyright © 2020, The Authors. All rights reserved.

关键词： Linear control systems

Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization

学校读者我要写书评

暂无评论

Adversarial Vertex Mixup: Toward Better Adversarially Robust...

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Saehyung Lee Hyungyu Lee Sungroh Yoon Electrical and Computer Engineering ASRI INMC and Institute of Engineering Research Seoul National University Seoul South Korea

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

Adversarial examples cause neural networks to produce incorrect outputs with high confidence. Although adversarial training is one of the most effective forms of defense against adversarial examples, unfortunately, a large gap exists between test accuracy and training accuracy in adversarial training. In this paper, we identify Adversarial Feature Overfitting (AFO), which may cause poor adversarially robust generalization, and we show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to AFO in our simple Gaussian model. Considering these theoretical results, we present soft labeling as a solution to the AFO problem. Furthermore, we propose Adversarial Vertex mixup (AVmixup), a soft-labeled data augmentation approach for improving adversarially robust generalization. We complement our theoretical analysis with experiments on CIFAR10, CIFAR100, SVHN, and Tiny ImageNet, and show that AVmixup significantly improves the robust generalization performance and that it reduces the trade-off between standard accuracy and adversarial robustness.

关键词： Robustness Training Standards Perturbation methods Complexity theory Upper bound Data models