In cooperative retransmissions, nodes with better channel qualities help other nodes in retransmitting a failed packet to its intended destination. In this paper, we propose a cooperative retransmission scheme where e...
详细信息
In cooperative retransmissions, nodes with better channel qualities help other nodes in retransmitting a failed packet to its intended destination. In this paper, we propose a cooperative retransmission scheme where each node makes local decision to cooperate or not to cooperate at what transmission power using a Markov decision process with reinforcementlearning. With the reinforcementlearning, the proposed scheme avoids solving an Markov decision process with a large number of states. Through simulations, we show that the proposed scheme is robust to collisions, is scalable with regard to the network size, and can provide significant cooperative diversity.
In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcementlearning principle. In contrast to other approaches, our method learns the system dynamics and ...
详细信息
In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcementlearning principle. In contrast to other approaches, our method learns the system dynamics and the value function separately, which permits to identify the individual characteristics and is, therefore, easily adaptable to changing conditions. The major problem in the context of learning control policies lies in high-dimensional state and action spaces, that needs to be explored in order to identify the optimal policy. In this paper, we propose an approach that learns the system dynamics and the value function in an alternating fashion based on Gaussian process models. Additionally, to reduce computation time and to make the system applicable to online learning, we present an efficient sparsification method. In experiments carried out with a real miniature blimp we demonstrate that our approach can learn height control online. Further results obtained with an inverted pendulum show that our method requires less data to achieve the same performance as an off-line learning approach.
暂无评论