检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

746 篇 会议
270 篇 期刊文献
4 册 图书

馆藏范围

1,020 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

711 篇 工学
- 520 篇 计算机科学与技术...
- 380 篇 电气工程
- 278 篇 控制科学与工程
- 153 篇 软件工程
- 79 篇 信息与通信工程
- 40 篇 交通运输工程
- 23 篇 仪器科学与技术
- 20 篇 机械工程
- 9 篇 生物工程
- 8 篇 电子科学与技术（可...
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 6 篇 动力工程及工程热...
- 6 篇 石油与天然气工程
- 4 篇 生物医学工程（可授...
- 3 篇 材料科学与工程（可...
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
- 3 篇 安全科学与工程
118 篇 理学
- 98 篇 数学
- 32 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 8 篇 物理学
- 4 篇 化学
66 篇 管理学
- 63 篇 管理科学与工程(可...
- 14 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

312 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
107 篇 adaptive dynamic...
104 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
78 篇 heuristic algori...
68 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
53 篇 convergence
51 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 artificial neura...
41 篇 cost function

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 univ sci & techn...
16 篇 univ illinois de...
15 篇 northeastern uni...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

54 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
15 篇 gao weinan
14 篇 zhao dongbin
13 篇 zhong xiangnan
12 篇 si jennie
12 篇 derong liu
10 篇 jagannathan s.
10 篇 dongbin zhao
10 篇 song ruizhuo
9 篇 abouheaf mohamme...

语言

994 篇 英文
20 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1020 条记录，以下是601-610 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Data-driven partially observable dynamic processes using adaptive dynamic programming

Data-driven partially observable dynamic processes using ada...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Xiangnan Zhong Zhen Ni Yufei Tang Haibo He Department of Electrical University of Rhode Island Kingston RI USA

ISBN: (纸本)9781479945511

adaptive dynamic programming (ADP) has been widely recognized as one of the “core methodologies” to achieve optimal control for intelligent systems in Markov decision process (MDP). Generally, ADP control design requires all the information of the system dynamics. However, in many practical situations, the measured input and output data can only represent part of the system states. This means the complete information of the system cannot be available in many real-world cases, which narrows the range of application of the ADP design. In this paper, we propose a data-driven ADP method to stabilize the system with partially observable dynamics based on neural network techniques. A state network is integrated into the typical actor-critic architecture to provide an estimated state from the measured input/output sequences. The theoretical analysis and the stability discussion of this data-driven ADP method are also provided. Two examples are studied to verify our proposed method.

关键词： dynamic programming Performance analysis Neural networks Optimal control Stability analysis Equations Markov processes

来源：评论

学校读者我要写书评

暂无评论

Optimal self-learning battery control in smart residential grids by iterative Q-learning algorithm

Optimal self-learning battery control in smart residential g...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Qinglai Wei Derong Liu Guang Shi Yu Liu Qiang Guan The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences

ISBN: (纸本)9781479945511

In this paper, a novel dual iterative Q-learning algorithm is developed to solve the optimal battery management and control problems in smart residential environments. The main idea is to use adaptive dynamic programming (ADP) technique to obtain the optimal battery management and control scheme iteratively for residential energy systems. In the developed dual iterative Q-learning algorithm, two iterations, including external and internal iterations, are introduced, where internal iteration minimizes the total cost of power loads in each period and the external iteration makes the iterative Q function converge to the optimum. For the first time, the convergence property of iterative Q-learning method is proven to guarantee the convergence property of the iterative Q function. Finally, numerical results are given to illustrate the performance of the developed algorithm.

关键词： battery management systems convergence of numerical methods dynamic programming iterative learning control optimal control smart power grids unsupervised learning adaptive dynamic programming technique convergence property dual iterative Q-learning algorithm external iterations internal iterations iterative Q function optimal battery control problems optimal battery management power loads residential energy systems smart residential environments total cost minimization Smart grids convergence of numerical methods Unsupervised learning Battery management systems Battery power supply dynamic programming Iterative Optimal control iterative methods Electric loads Q FUNCTIONS Iterative learning control Optimal

来源：评论

学校读者我要写书评

暂无评论

Heuristics for multiagent reinforcement learning in decentralized decision problems

Heuristics for multiagent reinforcement learning in decentra...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Martin W. Allen David Hahn Douglas C. MacFarland Computer Science Department University of Wisconsin-La Crosse La Crosse Wisconsin Computer Science Department Worcester Polytechnic Institute Worcester Massachusetts

Decentralized partially observable Markov decision processes (Dec-POMDPs) model cooperative multiagent scenarios, providing a powerful general framework for team-based artificial intelligence. While optimal algorithms exist for Dec-POMDPs, theoretical and empirical results demonstrate that they are impractical for many problems of real interest. We examine the use of reinforcement learning (RL) as a means to generate adequate, if not optimal, joint policies for Dec-POMDPs. It is easily demonstrated (and expected) that single-agent RL produces results of little joint utility. We therefore investigate heuristic methods, based upon the dynamics of the Dec-POMDP formulation, that bias the learning process to produce coordinated action. Empirical tests on a benchmark problem show that these heuristics significantly enhance learning performance, even out-performing a hand-crafted heuristic in cases where the learning process converges quickly.

关键词： Joints Heuristic algorithms learning (artificial intelligence) Equations Markov processes Benchmark testing Complexity theory

来源：评论

学校读者我要写书评

暂无评论

Active learning for classification: An optimistic approach

Active learning for classification: An optimistic approach

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Timothé Collet Olivier Pietquin GeorgiaTech-CNRS UMI France Supelec MaLIS Research group France LIFL (UMR 8022 CNRS / Lille 1) IUF (Institut Universitaire de France) France University Lille 1 France

In this paper, we propose to reformulate the active learning problem occurring in classification as a sequential decision making problem. We particularly focus on the problem of dynamically allocating a fixed budget of samples. This raises the problem of the trade off between exploration and exploitation which is traditionally addressed in the framework of the multi-armed bandits theory. Based on previous work on bandit theory applied to active learning for regression, we introduce four novel algorithms for solving the online allocation of the budget in a classification problem. Experiments on a generic classification problem demonstrate that these new algorithms compare positively to state-of-the-art methods.

关键词： Resource management Uncertainty Noise measurement Noise Algorithm design and analysis Shape Partitioning algorithms

来源：评论

学校读者我要写书评

暂无评论

Model-free Q-learning over finite horizon for uncertain linear continuous-time systems

Model-free Q-learning over finite horizon for uncertain line...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hao Xu S. Jagannathan College of Science and Engineering Texas A&M University-Corpus Christi Corpus Christi TX USA Department of Electrical and Computer Engineering Missouri University of Science and Technology Rolla MO USA

In this paper, a novel optimal control over finite horizon has been introduced for linear continuous-time systems by using adaptive dynamic programming (ADP). First, a new time-varying Q-function parameterization and its estimator are introduced. Subsequently, Q-function estimator is tuned online by using both Bellman equation in integral form and terminal cost. Eventually, near optimal control gain is obtained by using the Q-function estimator. All the closed-loop signals are shown to be bounded by using Lyapunov stability analysis where bounds are functions of initial conditions and final time while the estimated control signal converges close to the optimal value. The simulation results illustrate the effectiveness of the proposed scheme.

关键词： Optimal control Mathematical model Vectors Equations Integral equations Parameter estimation Tuning

来源：评论

学校读者我要写书评

暂无评论

adaptive fault identification for a class of nonlinear dynamic systems

Adaptive fault identification for a class of nonlinear dynam...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Li-Bing Wu Dan Ye Xin-Gang Zhao College of Information Science and Engineering Northeastern University Shenyang Liaoning P. R. China College of Sciences University of Science and Technology Liaoning Anshan Liaoning P. R. China State Key Laboratory of Robotics and Shenyang Institute of Automation CAS Shenyang Liaoning P. R. China

ISBN: (纸本)9781479945511

This paper is concerned with the diagnosis problem of actuator faults for a class of nonlinear systems. It is assumed that the upper bound of the Lipschtiz constant of the nonlinearity in the faulty system is unknown. Then, a new nonlinear observer for fault diagnosis based on an adaptive estimator is proposed. Moreover, by making use of the designed adaptive observer with on-line update control law without σ-modification condition to approximate the faulty system, it is proved that the estimate error of the adaptive control parameter, the output observation error and the error between the system fault and the corresponding estimate value are uniformly ultimately bounded via Lyapunov stability analysis. Finally, simulation examples are provided to illustrate the efficiency of the proposed fault identification approach.

关键词： Fault diagnosis Observers Nonlinear systems Upper bound Fault detection adaptive systems Educational institutions

来源：评论

学校读者我要写书评

暂无评论

Cognitive control in cognitive dynamic systems: A new way of thinking inspired by the brain

Cognitive control in cognitive dynamic systems: A new way of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Simon Haykin Ashkan Amiri Mehdi Fatemi Cognitive Systems Laboratory McMaster University Hamilton Ontario Canada

Briefly, main purpose of the paper is fourfold: a) Cognitive perception, which consists of two functional blocks: improved sparse-coding under the influence of perceptual attention for extracting relevant information from the observables and ignoring irrelevant information, followed by a Bayesian algorithm for state estimation. b) Entropic state of the perceptor, which provides feedback information to the controller. c) Cognitive control, which also consists of two functional blocks: executive learning algorithm computed by processing the entropic state, followed by predictive planning to set the stage for policy to act on the environment, thereby establishing the global perception-action cycle. d) Experimental results for exploiting the perceptual as well as executive attention in a co-operative manner, which is aimed at the first demonstration of risk control in the presence of a severe disturbance in the environment.

关键词： Planning Mathematical model Bayes methods Heuristic algorithms Prediction algorithms Equations Feedforward neural networks

来源：评论

学校读者我要写书评

暂无评论

A Multi-Agent Q-learning-based Framework for Achieving Fairness in HTTP adaptive Streaming

A Multi-Agent Q-Learning-based Framework for Achieving Fairn...

引用

14th ieee/IFIP Network Operations and Management symposium (NOMS)

作者： Petrangeli, Stefano Claeys, Maxim Latre, Steven Famaey, Jeroen De Turck, Filip Univ Ghent IMinds Dept Informat Technol INTEC B-9050 Ghent Belgium Univ Antwerp IMinds Dept Math & Comp Sci B-2020 Antwerp Belgium

ISBN: (纸本)9781479909131

HTTP adaptive Streaming (HAS) is quickly becoming the de facto standard for Over-The-Top video streaming. In HAS, each video is temporally segmented and stored in different quality levels. Quality selection heuristics, deployed at the video player, allow dynamically requesting the most appropriate quality level based on the current network conditions. Today's heuristics are deterministic and static, and thus not able to perform well under highly dynamic network conditions. Moreover, in a multi-client scenario, issues concerning fairness among clients arise, meaning that different clients negatively influence each other as they compete for the same bandwidth. In this article, we propose a reinforcement learning-based quality selection algorithm able to achieve fairness in a multi-client setting. A key element of this approach is a coordination proxy in charge of facilitating the coordination among clients. The strength of this approach is three-fold. First, the algorithm is able to learn and adapt its policy depending on network conditions, unlike current HAS heuristics. Second, fairness is achieved without explicit communication among agents and thus no significant overhead is introduced into the network. Third, no modifications to the standard HAS architecture are required. By evaluating this novel approach through simulations, under mutable network conditions and in several multi-client scenarios, we are able to show how the proposed approach can improve system fairness up to 60% compared to current HAS heuristics.

关键词： HTTP

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-based optimal control considering L computation time delay of linear discrete-time systems

Reinforcement learning-based optimal control considering L c...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Taishi Fujita Toshimitu Ushio Department of Cybernetics Czech Technical University Prague Czech Republic

In embedded control systems, the control input is computed based on sensing data of a plant in a processor and there is a delay, called the computation time delay, due to the computation and the data transmission. When we design an optimal controller, we need to take the delay into account to achieve its optimality. Moreover, in the case where it is difficult to identify a mathematical model of the plant, a model free approach is useful. Especially, the reinforcement learning-based approach has been much attention to in the design of an adaptive optimal controller. In this paper, we assume that the plant is a linear system but the parameters of the plant are unknown. Then, we apply the reinforcement learning to the design of an adaptive optimal digital controller with taking the computation time delay into consideration. First, we consider the case where all states of the plant are observed, and it takes L times to update the control input. An optimal feedback gain is learned from sequences of a pair of the state and the control input. Next, we consider the case where the control input is determined from outputs of the plant. We cannot use an observer to estimate the state of the plant since the parameters of the plant are unknown. So, we use a data-based control approach for the estimation. Finally, we apply the proposed adaptive optimal controller to attitude control of a quadrotor at the hovering state and show its efficiency by simulation.

关键词： Delay effects State feedback Optimal control Output feedback Adaptation models Propellers

来源：评论

学校读者我要写书评

暂无评论

Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning

Closed-loop control of anesthesia and mean arterial pressure...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Regina Padmanabhan Nader Meskin Wassim M. Haddad Department of Electrical Engineering Qatar University Qatar School of Aerospace Engineering Georgia Institute of Technology Atlanta GA USA

General anesthesia is required for patients undergoing surgery as well as for some patients in the intensive care units with acute respiratory distress syndrome. How-ever, most anesthetics affect cardiac and respiratory functions. Hence, it is important to monitor and control the infusion of anesthetics to meet sedation requirements while keeping patient vital parameters within safe limits. The critical task of anesthesia administration also necessitates that drug dosing be optimal, patient specific, and robust. In this paper, the concept of reinforcement learning (RL) is used to develop a closed-loop anesthesia controller using the bispectral index (BIS) as a control variable while concurrently accounting for mean arterial pressure (MAP). In particular, the proposed framework uses these two parameters to control propofol infusion rates to regulate the BIS and MAP within a desired range. Specifically, a weighted combination of the error of the BIS and MAP signals is considered in the proposed RL algorithm. This reduces the computational complexity of the RL algorithm and consequently the controller processing time.

关键词： Drugs Anesthesia learning (artificial intelligence) Blood pressure Indexes Biomedical monitoring Optimal control

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共102页 << < 57 58 59 60 61 62 63 64 65 66 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：