Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ...
详细信息
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience *** knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
Decentralized deep learning has made significant success since it avoids the single point of failure in centralized solutions. However, the system might deviate from the correct model due to Byzantine attacks. Existin...
详细信息
Decentralized deep learning has made significant success since it avoids the single point of failure in centralized solutions. However, the system might deviate from the correct model due to Byzantine attacks. Existing Byzantine-resilient defense models are mainly of a one-step evaluation fashion, making them vulnerable to rigorous topology and sophisticated cyber-attacks due to lack of historical evaluations. This paper proposes a credibility assessment based parameter aggregation rule (CA-PAR) that evaluates each neighboring node by its long-term performance. For each node and its neighbors, two concepts, immediate reward and history information based credibility are firstly proposed to describe the immediate reliability at current iteration and the comprehensive assessment of the reliability respectively. Thereafter, all the received parameters are aggregated in linear combination, in which the adjacent weight is determined by credibility value. Finally, the influences of suspicious nodes can gradually be reduced and eliminated. Experimental results in MNIST and CIFAR-10 datasets indicate the algorithm’s tolerance for five state-of-the-art attack methods against an arbitrary number of faulty nodes. Compared with the previous defense models, the proposed algorithm in this paper outperforms in topology constraints, training accuracy and computation cost. IEEE
暂无评论