Despite the increasing interest in neural architecture search (NAS), the significant computational cost of NAS is a hindrance to researchers. Hence, we propose to reduce the cost of NAS using proxy data, i.e., a repre...
详细信息
In this paper, we consider the problem of autonomous driving using imitation learning in a semi-supervised manner. In particular, both labeled and unlabeled demonstrations are leveraged during training by estimating t...
详细信息
In case of deep reinforcement learning (RL) algorithms, to achieve high performance in complex continuous control tasks, it is necessary to exploit the goal and at the same time explore the environment. In this paper,...
详细信息
ISBN:
(数字)9781728157153
ISBN:
(纸本)9781728157160
In case of deep reinforcement learning (RL) algorithms, to achieve high performance in complex continuous control tasks, it is necessary to exploit the goal and at the same time explore the environment. In this paper, we introduce a novel off-policy actor-critic reinforcement learning algorithm with a sparse Tsallis entropy regularizer. The sparse Tsallis entropy regularizer has the effect of maximizing the expected returns while maximizing the sparse Tsallis entropy for its policy function. Maximizing the sparse Tsallis entropy makes the actor to explore the large action and state space efficiently, thus it helps us to find the optimal action at each state. We derive the iteration update rules and modify a policy iteration rule for an off-policy method. In experiments, we demonstrate the effectiveness of the proposed method in continuous reinforcement learning problems in terms of the convergence speed. The proposed method outperforms former on-policy and off-policy RL algorithms in terms of the convergence speed and performance.
We propose a distributed control, in which many identical control agents are deployed for controlling a linear time-invariant plant that has multiple input-output channels. Each control agent can join or leave the con...
详细信息
This paper proposes an inverse optimal control (IOC) framework which incorporates demonstrations with mixed qualities. The proposed method utilizes the benefits of sub-optimal demonstrations which can provide informat...
详细信息
ISBN:
(数字)9781728157153
ISBN:
(纸本)9781728157160
This paper proposes an inverse optimal control (IOC) framework which incorporates demonstrations with mixed qualities. The proposed method utilizes the benefits of sub-optimal demonstrations which can provide information about what not to do and supplies training data near states unvisited by optimal demonstrations. The main idea of the proposed method is to find the value function which satisfies the optimality condition over optimal demonstrations and violates it over sub-optimal demonstrations. We conduct experiments on three environments and empirically show that the proposed method outperforms the original IOC algorithm, which uses only optimal demonstrations.
In this paper, we propose an output-feedback fault-tolerant controller(FTC) for a class of uncertain multi-input single-output systems under float and lock-in-place actuator faults. Of particular interest is to recove...
详细信息
In this paper, we propose an output-feedback fault-tolerant controller(FTC) for a class of uncertain multi-input single-output systems under float and lock-in-place actuator faults. Of particular interest is to recover a fault-free tracking performance of a(pre-defined) nominal closed-loop system, during the entire time period including the transients due to abrupt actuator faults. As a key component, a highgain disturbance observer(DOB) is employed to rapidly compensate the lumped disturbance, a compressed expression of all the effect of actuator faults(as well as model uncertainty and disturbance) on the *** implement this high-gain approach, a fixed control allocation(CA) law is presented in order to keep an extended system with a virtual scalar input to remain of minimum phase under any patterns of faults. It is shown via the singular perturbation theory that the proposed FTC, consisting of the high-gain DOB and the CA law, resolves the problem in an approximate but arbitrarily accurate sense. Simulations with the linearized lateral model of Boeing 747 are performed to verify the validity of the proposed FTC scheme.
We propose a simple but effective data-driven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. The proposed approach makes a joi...
详细信息
We study the design problems of state observers and tracking controllers for a class of hybrid systems whose state jumps. The idea is to utilize the well-known method of gluing the jump set (a part of domain where the...
详细信息
In this paper, we propose a minimax linear-quadratic control method to address the issue of inaccurate distribution information in practical stochastic systems. To construct a control policy that is robust against err...
详细信息
Adversarial examples cause neural networks to produce incorrect outputs with high confidence. Although adversarial training is one of the most effective forms of defense against adversarial examples, unfortunately, a ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Adversarial examples cause neural networks to produce incorrect outputs with high confidence. Although adversarial training is one of the most effective forms of defense against adversarial examples, unfortunately, a large gap exists between test accuracy and training accuracy in adversarial training. In this paper, we identify Adversarial Feature Overfitting (AFO), which may cause poor adversarially robust generalization, and we show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to AFO in our simple Gaussian model. Considering these theoretical results, we present soft labeling as a solution to the AFO problem. Furthermore, we propose Adversarial Vertex mixup (AVmixup), a soft-labeled data augmentation approach for improving adversarially robust generalization. We complement our theoretical analysis with experiments on CIFAR10, CIFAR100, SVHN, and Tiny ImageNet, and show that AVmixup significantly improves the robust generalization performance and that it reduces the trade-off between standard accuracy and adversarial robustness.
暂无评论