检索结果-内蒙古大学图书馆

Proceedings of the 2007 ieee symposium on approximate dynamic programming and reinforcement learning, ADPRL 2007 2007年

作者： Liu, Derong Munos, Remi Si, Jennie Wunsch, II, Donald C.

No abstract available

ISBN: (纸本)1424407060

No abstract available

关键词：

来源：评论

学校读者我要写书评

暂无评论

A reinforcement-learning, Optimal Approach to In Situ Power Hardware-in-the-Loop Interface Control for Testing Inverter-Based Resources: Theory and Application of the Adaptive dynamic programming Based on the Hybrid Iteration to Tackle Uncertain dynamics

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2025年第6期72卷 5867-5883页

作者： Davari, Masoud Qasem, Omar Gao, Weinan Blaabjerg, Frede Kotsampopoulos, Panos C. Lauss, Georg Hatziargyriou, Nikos D. Georgia Southern Univ Allen E Paulson Coll Engn & Comp Dept Elect & Comp Engn Statesboro Campus Statesboro GA 30460 USA Amer Int Univ Sch Engn & Comp Elect & Comp Engn Dept Al Jahra Kuwait Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110004 Liaoning Peoples R China Aalborg Univ Dept AAU Energy DK-9220 Aalborg Denmark Natl Tech Univ Athens Sch Elect & Comp Engn Power Div Athens 15780 Greece Austrian Inst Technol AIT Dept Elect Energy Syst EES A-1210 Vienna Austria

Testing inverter-based resources (IBRs) is of utmost importance. This paper proposes a novel power hardware-in-the-loop (PHIL) interface control (PHIL-IC) employing a reinforcement-learning approach based on adaptive dynamic programming (ADP, also known as approximate dynamic programming) to enhance the PHIL-simulation-based testing of IBRs by virtue of an ADP-based method. It deploys output feedback control because of "unavailable" or "uncertain" dynamics of the entire systems (states and disturbances) linked to IBRs, power amplifiers, all the components associated with the PHIL-simulation-based testing, and their delays;it optimally designs PHIL-IC while considering all uncertainties and unavailable information about all the systems involved. To this end, the proposed ADP-based PHIL-IC utilizes a new hybrid iteration (HI) method, which differs from the traditional ADP strategies;compared with the policy iteration method, the HI algorithm does not require prior knowledge of an admissible control policy. Moreover, with a quadratic rate of convergence, the proposed HI method converges much faster than the value iteration method. Therefore, the proposed HI method saves significant learning time and iterations compared to the value iteration method. Comparing the results of the PHIL-simulation-based testing utilizing the proposed method with those of the proportional-resonant controller (as the conventional PHIL-IC) and the robust PHIL-IC based on mu synthesis (as the current state-of-the-art PHIL-IC) reveals the effectiveness and practicality of the proposed method. Those comparative results are generated by the ideal transformer model (also known as voltage-type interface) commonly used in the PHIL-simulation-based testing and practical cases of the Thevenin equivalent impedance (resistive, resistive-inductive, and inductive ones) of the model of interest associated with the power networks.

关键词： Adaptive dynamic programming (ADP, also known as approximate dynamic programming) hybrid iteration (HI) method/algorithm inverter-based resources (IBRs) output feedback control policy iteration (PI) method/algorithm power hardware-in-the-loop (PHIL) power hardware-in-the-loop interface control (PHIL-IC) power-hardware-in-the-loop-simulation-based (PHIL-simulation-based) testing value iteration (VI) method/algorithm

来源：评论

学校读者我要写书评

暂无评论

Practical Task and Motion Planning for Robotic Food Preparation

Practical Task and Motion Planning for Robotic Food Preparat...

引用

2025 ieee/SICE international symposium on System Integration, SII 2025

作者： Siburian, Jeremy Beltran-Hernandez, Cristian C. Hamaya, Masashi Waseda University Tokyo169-8555 Japan OMRON SINIC X Corporation Tokyo113-0033 Japan

ISBN: (纸本)9798331531614

To fully integrate robots into household settings, they must be capable of autonomously planning and executing diverse tasks. However, task and motion planning for multistep manipulation tasks remains an open challenge in robotics, especially for long-horizon tasks in dynamic environments. This study presents an integrated task and motion planning (TAMP) robotic framework for real-world cooking tasks using a dualarm robotic system. Our framework combines PDDLStream, an existing TAMP framework, with the MoveIt Task Constructor, a multi-stage manipulation planner, to improve multi-step motion planning for long-horizon tasks. We enhance our framework with various cooking-related skills, including object fixturing, force-based tip detection, and slicing using reinforcement learning (RL). As a motivating case study, we address the long-horizon task of preparing a simple cucumber salad, involving slicing and serving it on a plate. We showcase our framework through both simulation and real robot demonstration. © 2025 ieee.

关键词： Robot programming

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 2007 ieee symposium on approximate dynamic programming and reinforcement learning (ADPRL 2007)

Proceedings of the 2007 IEEE Symposium on Approximate Dynami...

引用

2007 ieee symposium on approximate dynamic programming and reinforcement learning, ADPRL 2007

ISBN: (纸本)1424407060

The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcement learning approach in two-level intelligent control of 3-DOF robot manipulators;knowledge transfer using local features;particle swarm optimization adaptive dynamic programming;discrete-time nonlinear HJB solution using approximation dynamic programming: convergence proof;dual representations for dynamic programming and reinforcement learning;an optimal ADP algorithm for a high-dimensional stochastic control problem;convergence of model-based temporal difference learning for control;the effect of bootstrapping in multi-automata reinforcement learning;and a theoretical analysis of cooperative behavior in multi-agent Q-learning.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning by backpropagation through an LSTM model/critic

Reinforcement learning by backpropagation through an LSTM mo...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Bakker, Bram Univ Amsterdam Inst Informat Intelligent Syst Lab Amsterdam NL-1098 SJ Amsterdam Netherlands

ISBN: (纸本)9781424407064

This paper describes backpropagation through an LSTM recurrent neural network model/critic, for reinforcement learning tasks in partially observable domains. This combines the advantage of LSTM's strength at learning long-term temporal dependencies to infer states in partially observable tasks, with the advantage of being able to learn high-dimensional and/or continuous actions with backpropagation's focused credit assignment mechanism.

关键词： Backpropagation State-space methods Recurrent neural networks Neural networks Observability dynamic programming learning systems Intelligent systems Intelligent networks Laboratories

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming strategy for responsive traffic signal control

An approximate dynamic programming strategy for responsive t...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Cai, Chen Univ Coll London Ctr Transport Studies London WC1E 6BT England

ISBN: (纸本)9781424407064

This paper proposes an approximate dynamic programming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of value function. The proposed value function approximation is separable and exogenous factor independent. The algorithm updates the approximated value function progressively in operation, while preserving the structural property of the control problem. The convergence and performance of the algorithm have been tested in a range of experiments. It has been concluded that the new strategy is as good as the best existing control strategies while being efficient and simple in computation. It also has the potential of being extended to multi-phase signal control at isolate junction and to decentralized network operation.

关键词： dynamic programming Traffic control Function approximation Communication system traffic control Adaptive control Roads learning Testing Delay Vehicle safety

来源：评论

学校读者我要写书评

暂无评论

Toward effective combination of off-line and on-line training in ADP framework

Toward effective combination of off-line and on-line trainin...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Prokhorov, Danil Toyota Technol Ctr Ann Arbor MI 48105 USA

ISBN: (纸本)9781424407064

We are interested in finding the most effective combination between off-line and on-line/real-time training in approximate dynamic programming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi- stream Kalman filter method [1], whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcement learning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Randomly sampling actions in dynamic programming

Randomly sampling actions in dynamic programming

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Atkeson, Christopher G. Carnegie Mellon Univ Inst Robot Pittsburgh PA 15213 USA

ISBN: (纸本)9781424407064

We describe an approach towards reducing the curse of dimensionality for deterministic dynamic programming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

The knowledge gradient policy for offline learning with independent normal rewards

The knowledge gradient policy for offline learning with inde...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Frazier, Peter Powell, Warren Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simula... 详细信息

ISBN: (纸本)9781424407064

关键词： learning systems

来源：评论

学校读者我要写书评

暂无评论

Dual representations for dynamic programming and reinforcement learning

Dual representations for dynamic programming and reinforceme...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Wang, Tao Bowling, Michael Schuurmans, Dale Univ Alberta Dept Comp Sci Edmonton AB Canada

ISBN: (纸本)9781424407064

We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distribu tions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：