检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

299 篇 会议
8 篇 期刊文献

馆藏范围

307 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

180 篇 工学
- 158 篇 计算机科学与技术...
- 56 篇 电气工程
- 48 篇 软件工程
- 47 篇 控制科学与工程
- 13 篇 信息与通信工程
- 10 篇 机械工程
- 6 篇 仪器科学与技术
- 4 篇 力学（可授工学、理...
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 2 篇 交通运输工程
- 2 篇 核科学与技术
- 2 篇 生物医学工程（可授...
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 航空宇航科学与技...
- 1 篇 食品科学与工程（可...
40 篇 理学
- 35 篇 数学
- 9 篇 系统科学
- 8 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 生物学
- 1 篇 化学
- 1 篇 天文学
- 1 篇 大气科学
- 1 篇 地球物理学
- 1 篇 地质学
18 篇 管理学
- 17 篇 管理科学与工程(可...
- 7 篇 工商管理
4 篇 经济学
- 4 篇 应用经济学
1 篇 医学

主题

115 篇 dynamic programm...
76 篇 reinforcement le...
67 篇 learning
47 篇 optimal control
30 篇 neural networks
27 篇 control systems
21 篇 approximate dyna...
21 篇 approximation al...
20 篇 function approxi...
20 篇 equations
17 篇 convergence
16 篇 adaptive dynamic...
16 篇 state-space meth...
16 篇 heuristic algori...
14 篇 mathematical mod...
13 篇 stochastic proce...
12 篇 learning (artifi...
12 篇 adaptive control
12 篇 cost function
11 篇 algorithm design...

机构

5 篇 arizona state un...
4 篇 department of el...
4 篇 school of inform...
4 篇 department of in...
4 篇 univ sci & techn...
4 篇 chinese acad sci...
4 篇 department of el...
3 篇 princeton univ d...
3 篇 northeastern uni...
3 篇 national science...
3 篇 robotics institu...
3 篇 univ illinois de...
3 篇 univ utrecht dep...
2 篇 univ groningen i...
2 篇 sharif univ tech...
2 篇 univ texas autom...
2 篇 pengcheng labora...
2 篇 guangxi univ sch...
2 篇 chinese acad sci...
2 篇 cemagref lisc au...

作者

14 篇 liu derong
9 篇 wei qinglai
8 篇 si jennie
7 篇 xu xin
5 篇 derong liu
4 篇 lewis frank l.
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 jennie si
4 篇 marco a. wiering
4 篇 xin xu
4 篇 zhang huaguang
4 篇 dongbin zhao
4 篇 lei yang
4 篇 powell warren b.
4 篇 riedmiller marti...
3 篇 hado van hasselt
3 篇 van hasselt hado
3 篇 jagannathan s.
3 篇 munos remi

语言

305 篇 英文
1 篇 其他
1 篇 中文

检索条件"任意字段=IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning"

共 307 条记录，以下是211-220 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

A learning Approach to Multi-robot Task Allocation with Priority Constraints and Uncertainty

A Learning Approach to Multi-robot Task Allocation with Prio...

引用

2022 ieee international Conference on Industrial Technology, ICIT 2022

作者： Deng, Fuqin Huang, Huanzhao Fu, Lanhui Yue, Hongwei Zhang, Jianmin Wu, Zexiao Lam, Tin Lun Wuyi University School of Intelligent Manufacturing Guangdong Jiangmen529020 China The Shenzhen Institute of Artificial Intelligence and Robotics for Society Guangdong Shenzhen518000 China The 3irobotix Co. Ltd Guangdong Shenzhen518000 China Guangdong University of Education School of Physics and Information Engineering Guangdong Guangzhou510303 China The Chinese University of HongKong School of Science and Engineering Guangdong Shenzhen518000 China

ISBN: (纸本)9781728119489

Multi-robot task allocation has an important impact on the efficiency of multi-robot collaboration. For single-shot allocation without complicated constraints, some exact algorithms and heuristic algorithms can find the optimal solution efficiently. However, considering the priority constraints and uncertain execution time of robots for multiple times of allocation in an approximate dynamic programming environment, traditional methods such as heuristic algorithms have limited performance. To obtain better performance, we propose a method based on deep reinforcement learning. Specifically, we first use the directed acyclic graph to describe the priority relationship between tasks. Then we propose a graph neural network with a hierarchical attention mechanism to extract the characteristics of the task groups. Finally, we design the policy network to solve the approximate dynamic programming problem of multi-robot task allocation. Through training on the dataset of a given environment, the policy network can gradually refine the decision-making process by reinforcement learning. Experiment results show that the proposed modeling and solving method can find better solutions than existing heuristic algorithms. Furthermore, the learned strategy can be directly applied in other untrained environments with superior performance. © 2022 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

ATM: approximate Task Memoization in the Runtime System 31

ATM: Approximate Task Memoization in the Runtime System

引用

31st ieee international Parallel and Distributed Processing symposium (IPDPS)

作者： Brumar, Iulian Casas, Marc Moreto, Miguel Valero, Mateo Sohi, Gurindar S. BSC Barcelona Spain Univ Wisconsin Madison WI 53706 USA

ISBN: (纸本)9781538639146

Redundant computations appear during the execution of real programs. Multiple factors contribute to these unnecessary computations, such as repetitive inputs and patterns, calling functions with the same parameters or bad programming habits. Compilers minimize non useful code with static analysis. However, redundant execution might be dynamic and there are no current approaches to reduce these inefficiencies. Additionally, many algorithms can be computed with different levels of accuracy. approximate computing exploits this fact to reduce execution time at the cost of slightly less accurate results. In this case, expert developers determine the desired tradeoff between performance and accuracy for each application. In this paper, we present approximate Task Memoization (ATM), a novel approach in the runtime system that transparently exploits both dynamic redundancy and approximation at the task granularity of a parallel application. Memoization of previous task executions allows predicting the results of future tasks without having to execute them and without losing accuracy. To further increase performance improvements, the runtime system can memoize similar tasks, which leads to task approximate computing. By defining how to measure task similarity and correctness, we present an adaptive algorithm in the runtime system that automatically decides if task approximation is beneficial or not. When evaluated on a real 8-core processor with applications from different domains (financial analysis, stencil-computation, machine-learning and linear-algebra), ATM achieves a 1.4x average speedup when only applying memoization techniques. When adding task approximation, ATM achieves a 2.5x average speedup with an average 0.7% accuracy loss (maximum of 3.2%).

关键词： Linear algebra

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning Solution to the Nonlinear Spacecraft Pursuit-Evasion Game Problem 14

A Reinforcement Learning Solution to the Nonlinear Spacecraf...

引用

14th ieee international Conference on Cyber Technology in Automation, Control, and Intelligent Systems, CYBER 2024

作者： Huang, Haoqi Ran, Guangtao Lyu, Yueyong Ma, Guangfu Harbin Institute of Technology Department of Control Science and Engineering Harbin150001 China

ISBN: (纸本)9798331506056

The pursuit-evasion game of non-cooperative spacecrafts under nonlinear dynamics is currently a hot topic in orbital gaming. We describe the above pursuit-evasion game model using differential game theory, transforming the gaming problem into a bilateral optimal control problem. Using elliptical orbit line-of-sight (LOS) dynamics with simple field-of-view constrains as the system model, we solve the Nash equilibrium solution for the two-body pursuit-evasion under the assumption of complete information. Due to the difficulty in obtaining analytical solutions for the Nash equilibrium, we adopt a reinforcement learning (RL)-based adaptive dynamic programming method. We obtain the approximate Nash equilibrium solution with RL method eventually and provide a successful simulation example. © 2024 ieee.

关键词： Orbits

来源：评论

学校读者我要写书评

暂无评论

Online reinforcement learning Neural Network Controller Design for Nanomanipulation

Online Reinforcement Learning Neural Network Controller Desi...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Qinmin Yang S. Jagannathan Department of Electrical & Computer Engineering University of Missouri Rolla MO USA

In this paper, a novel reinforcement learning neural network (NN)-based controller, referred to adaptive critic controller, is proposed for affine nonlinear discrete-time systems with applications to nanomanipulation. In the online NN reinforcement learning method, one NN is designated as the critic NN, which approximates the long-term cost function by assuming that the states of the nonlinear systems is available for measurement. An action NN is employed to derive an optimal control signal to track a desired system trajectory while minimizing the cost function. Online updating weight tuning schemes for these two NNs are also derived. By using the Lyapunov approach, the uniformly ultimate boundedness (UUB) of the tracking error and weight estimates is shown. Nanomanipulation implies manipulating objects with nanometer size. It takes several hours to perform a simple task in the nanoscale world. To accomplish the task automatically the proposed online learning control design is evaluated for the task of nanomanipulation and verified in the simulation environment

关键词： learning Neural networks Nonlinear control systems Control systems Cost function Programmable control Adaptive control Nonlinear systems Optimal control Trajectory

来源：评论

学校读者我要写书评

暂无评论

Feature discovery in approximate dynamic programming

Feature discovery in approximate dynamic programming

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Philippe Preux Sertan Girgin Manuel Loth Laboratoire dInformatique Fondamentale de Lille (Computer Science Laboratory associated to the CNRS) and the INRIAINRIA Université de Lille France

Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcement learning in particular. Based on our recent work on feature discovery in the context of reinforcement learning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamic programming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.

关键词： dynamic programming Function approximation Games Machine learning Acceleration Computer science Software tools Artificial intelligence Velocity control Control systems

来源：评论

学校读者我要写书评

暂无评论

Evolutionary Adaptive dynamic programming Algorithm for Converter Gas Scheduling of Steel Industry 6

Evolutionary Adaptive Dynamic Programming Algorithm for Conv...

引用

6th international symposium on Advanced Control of Industrial Processes (AdCONIP)

作者： Wang, Tianyu Wang, Linqing Zhao, Jun Wang, Wei Liu, Ying Dalian Univ Technol Sch Control Sci & Engn Dalian 116024 Peoples R China

ISBN: (纸本)9781509043972

It is significant to perform an effective scheduling of byproduct gas system in steel industry for reducing cost and protecting environment. The existing studies largely focused on extracting specific knowledge from human experience or directly optimizing the scheduling performance, which failed to provide a dynamic optimization process for making the scheduling scheme updated online. In this study, an action-dependent heuristic dynamic programming (ADHDP) framework is proposed for the Linz Donawitz converter gas (LDG) scheduling, in which the scheduling amount is calculated based on the gas system states by utilizing a Tagaki-Sugeno-Kang (TSK) fuzzy model, while a utility function is introduced in the critic network considering the time delay of the gas system to evaluate the scheduling performance over time. For achieving online learning process, the concept of a modified evolutionary algorithm is combined with the ADHDP to obtain the near-optimal scheduling policy at each time instance. To demonstrate the performance of the proposed method, the practical data coming from the energy center of a steel plant are employed. The results show that the proposed method can supply the human operators with effective solution for secure and economically justified optimization of the LDG system.

关键词： steel industry energy scheduling adaptive dynamic programming reinforcement learning evolutionary computing

来源：评论

学校读者我要写书评

暂无评论

2011 ieee international symposium on Intelligent Control, ISIC 2011

2011 IEEE International Symposium on Intelligent Control, IS...

引用

2011 ieee international symposium on Intelligent Control, ISIC 2011

ISBN: (纸本)9781457711046

The proceedings contain 39 papers. The topics discussed include: optimal network localization by particle swarm optimization;a framework for adaptive tuning of distributed model predictive controllers by Lagrange multipliers;probabilistic fault detection and handling algorithm for testing stability control systems with a drive-by-wire vehicle;distance-based control of cycle-free persistent formations;satellite formation flying with input saturation: an LMI approach;performance information in risk-averse control of model-following systems;an interpolation method of multiple terminal iterative learning control;formation control of mobile agent groups based on localization;image-correlation data association with phase-varying uncertainty techniques;approximate dynamic programming for stochastic systems with additive and multiplicative noise;and iterative learning control for discrete linear systems with zero Markov parameters using repetitive process stability theory.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Continuous-Time ADP for Linear Systems with Partially Unknown dynamics

Continuous-Time ADP for Linear Systems with Partially Unknow...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Draguna Vrabie Murad Abu-Khalaf Frank L. Lewis Youyi Wang Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA School of Electrical and Electronic Engineering Nanyang Technological University Singapore

approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control problem for continuous-time dynamical systems; thus bringing together concepts from the fields of computational intelligence and control theory. Only partial knowledge about the system model is used, as knowledge about the plant internal dynamics is not needed. The method is thus useful to determine the optimal controller for plants with partially unknown dynamics. It is shown that the proposed iterative ADP algorithm is in fact a quasi-Newton method to solve the underlying algebraic Riccati equation (ARE) of the optimal control problem. An initial gain that determines a stabilizing control policy is not required. In control theory terms, in this paper is developed a direct adaptive control algorithm for obtaining the optimal control solution without knowing the system A matrix

关键词： Linear systems Optimal control dynamic programming Adaptive control Control theory Iterative algorithms Riccati equations Sampling methods Programmable control Infinite horizon

来源：评论

学校读者我要写书评

暂无评论

A Scalable Model-Free Recurrent Neural Network Framework for Solving POMDPs

A Scalable Model-Free Recurrent Neural Network Framework for...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Zhenzhen Liu Itamar Elhanany Department of Electrical & Computer Engineering University of Tennessee Knoxville TN USA

This paper presents a framework for obtaining an optimal policy in model-free partially observable Markov decision problems (POMDPs) using a recurrent neural network (RNN), A Q-function approximation approach is taken, utilizing a novel RNN architecture with computation and storage requirements that are dramatically reduced when compared to existing schemes. A scalable online training algorithm, derived from the real-time recurrent learning (RTRL) algorithm, is employed. Moreover, stochastic meta-descent (SMD), an adaptive step size scheme for stochastic gradient-descent problems, is utilized as means of incorporating curvature information to accelerate the learning process. We consider case studies of POMDPs where state information is not directly available to the agent. Particularly, we investigate scenarios in which the agent receives identical observations for multiple states, thereby relying on temporal dependencies captured by the RNN to obtain the optimal policy, Simulation results illustrate the effectiveness of the approach along with substantial improvement in convergence rate when compared to existing schemes

关键词： Recurrent neural networks Neurons Stochastic processes Nonlinear dynamical systems Computational complexity dynamic programming learning Computer networks Computer architecture Acceleration

来源：评论

学校读者我要写书评

暂无评论

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

A convergent recursive least squares approximate policy iter...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jun Ma Warren B. Powell Department of Operations Research and Financial Engineering Princeton University Princeton NJ USA

In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinite-horizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.

关键词： Least squares approximation Function approximation Convergence Approximation algorithms dynamic programming Infinite horizon Least squares methods Acoustic noise State-space methods

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共31页 << < 18 19 20 21 22 23 24 25 26 27 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：