Control theory has been widely used in various fields;one of these areas is medical issues. Diabetes is one of the new topics of interest in control. Obtaining the rates for the injection of insulin automatically alwa...
详细信息
ISBN:
(纸本)9781509059638
Control theory has been widely used in various fields;one of these areas is medical issues. Diabetes is one of the new topics of interest in control. Obtaining the rates for the injection of insulin automatically always been a concern of physicians. The purpose of the control and treatment of diabetes, is keeping blood glucose in the normal range as possible. In this paper, we used sarsa method - which is an on-policy Temporal Difference (TD) technique - for insulin delivery rate. TD methods are the most known methods for solving reinforcement learning problem. Because TD methods don't require a precise model of environment dynamics;they have absorbed interests in medical applications during recent years. Although temporal difference methods don't require a mathematical model of the environment, but for simulating an environment, we used Palumbo mathematical model instead of real patients. Since patients' medical parameters vary from person to person, for controlling the disease we should have different drug schedules, in other word, we should have different controller for each patient. While RL methods, by interacting with their environment, automatically define suitable doses for each person. If we want less trial and error on real patients and therefore reduce the side effects of changes in dose on the patient;according to the parameters of a patient, we design a controller which estimate the appropriate insulin injection rate. Then the drug program can be applied to other real patients. At this stage controller (applies sarsa algorithm) with less trial and error, determines the appropriate dose for real patient. The results of the simulations, represents the efficiency of the proposed method.
Close collaboration and desired strategy is indispensable for humanoid robots in the RoboCup soccer competition. In order to solve the problem that the convergence rate is too low in training local strategies, this pa...
详细信息
Close collaboration and desired strategy is indispensable for humanoid robots in the RoboCup soccer *** order to solve the problem that the convergence rate is too low in training local strategies,this paper mainly pr...
详细信息
ISBN:
(纸本)9781479970186
Close collaboration and desired strategy is indispensable for humanoid robots in the RoboCup soccer *** order to solve the problem that the convergence rate is too low in training local strategies,this paper mainly proposed a method to optimize the parameters in decision and positioning based on reinforcement learning for soccer ***,Markov decision process is applied to the framework for reinforcement ***,we propose a relative improved method,which is known as a sarsa algorithm to overcome the drawback of the low convergence rate of the average reward reinforcement ***,in order to deal with the large state space problems arising in the training and improve the generalization ability,this method is applied to the Keepaway local *** training results show that,this algorithm has a faster convergent speed than other ordinary learning algorithm.
Urban traffic control is very complicated, so to build a precise mathematical model for it is very difficult. In this paper, we use the sarsa reinforcement leaning algorithm to control the traffic signal, thus the dec...
详细信息
ISBN:
(纸本)9780769535838
Urban traffic control is very complicated, so to build a precise mathematical model for it is very difficult. In this paper, we use the sarsa reinforcement leaning algorithm to control the traffic signal, thus the decision can be made dynamically according to real-time traffic state information, and the change of environment can be adapted automatically;As the state space is too big to be stored and expressed directly, we applied radial basis function neural network to approximate the state value function. By training self-adapted non-linear processing unit, and realizing online and adaptive constructing of state space, the approximation is improved and thus the control of traffic signal at single intersections is solved. The simulation results show that the effectiveness of the new control algorithm is obviously better than traditional sliced time allocation methods.
Blackjack or twenty-one is a card game where the player attempts to beat the dealer, by obtaining a sum of card values that is equal to or less than 21 so that his total is higher than the dealer's. The probabilis...
详细信息
ISBN:
(纸本)0780348591;0780348605
Blackjack or twenty-one is a card game where the player attempts to beat the dealer, by obtaining a sum of card values that is equal to or less than 21 so that his total is higher than the dealer's. The probabilistic nature of the game makes it an interesting testbed problem for learning algorithms, though the problem of learning a good playing strategy is not obvious. Learning with a teacher systems are not very useful since the target outputs for a given stage of the game are not known. Instead, the learning system has to explore different actions and develop a certain strategy by selectively retaining the actions that maximize the player's performance. This paper explores the use of blackjack as a test bed for learning strategies in neural networks, and specifically with reinforcement learning techniques. Furthermore, performance comparisons with previous related approaches are also reported.
暂无评论