检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

746 篇 会议
270 篇 期刊文献
4 册 图书

馆藏范围

1,020 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

711 篇 工学
- 520 篇 计算机科学与技术...
- 380 篇 电气工程
- 278 篇 控制科学与工程
- 153 篇 软件工程
- 79 篇 信息与通信工程
- 40 篇 交通运输工程
- 23 篇 仪器科学与技术
- 20 篇 机械工程
- 9 篇 生物工程
- 8 篇 电子科学与技术（可...
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 6 篇 动力工程及工程热...
- 6 篇 石油与天然气工程
- 4 篇 生物医学工程（可授...
- 3 篇 材料科学与工程（可...
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
- 3 篇 安全科学与工程
118 篇 理学
- 98 篇 数学
- 32 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 8 篇 物理学
- 4 篇 化学
66 篇 管理学
- 63 篇 管理科学与工程(可...
- 14 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

312 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
107 篇 adaptive dynamic...
104 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
78 篇 heuristic algori...
68 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
53 篇 convergence
51 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 artificial neura...
41 篇 cost function

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 univ sci & techn...
16 篇 univ illinois de...
15 篇 northeastern uni...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

54 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
15 篇 gao weinan
14 篇 zhao dongbin
13 篇 zhong xiangnan
12 篇 si jennie
12 篇 derong liu
10 篇 jagannathan s.
10 篇 dongbin zhao
10 篇 song ruizhuo
9 篇 abouheaf mohamme...

语言

994 篇 英文
20 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1020 条记录，以下是761-770 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Safe reinforcement learning in high-risk tasks through policy improvement

Safe reinforcement learning in high-risk tasks through polic...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Francisco Javier Garcia Polo Fernando Fernandez Rebollo Computer Science Department Universidad Carlos III de Madrid Madrid Spain

reinforcement learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition.

关键词： Helicopters Computer crashes Trajectory Robots Safety Robustness Mathematical model

来源：评论

学校读者我要写书评

暂无评论

Application of reinforcement learning-based algorithms in CO2 allowance and electricity markets

Application of reinforcement learning-based algorithms in CO...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Vishnuteja Nanduri Department of Industrial & Manufacturing Engineering University of Wisconsin Milwaukee Milwaukee WI USA

Climate change is one of the most important challenges faced by the world this century. In the U.S., the electric power industry is the largest emitter of CO 2 , contributing to the climate crisis. Federal emissions control bills in the form of cap-and-trade programs are currently idling in the U.S. Congress. In the mean time, ten states in the northeastern U.S. have adopted a regional cap-and-trade program to reduce CO 2 levels and also to increase investments in cleaner technologies. Many of the states in which the cap-and-trade programs are active operate under a restructured market paradigm, where generators compete to supply power. This research presents a bi-level game-theoretic model to capture competition between generators in cap-and-trade markets and restructured electricity markets. The solution to the game-theoretic model is obtained using a reinforcement learning based algorithm.

关键词： Generators Electricity supply industry Games Electricity Companies Meteorology Power systems

来源：评论

学校读者我要写书评

暂无评论

Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark

Improved neural fitted Q iteration applied to a novel comput...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Thomas Gabel Christian Lutz Martin Riedmiller Machine Learning Laboratory Department of Computer Science University of Freiburg Freiburg im Breisgau Germany

Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.

关键词： Marine vehicles Games learning systems learning Benchmark testing Heuristic algorithms Artificial neural networks

来源：评论

学校读者我要写书评

暂无评论

Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures

Enhancing the episodic natural actor-critic algorithm by a r...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Andreas Witsch Roland Reichle Kurt Geihs Sascha Lange Martin Riedmiller Distributed Systems Group Universität Kassel Germany Machine Learning Laboratory Albert Ludwigs Universität Freiburg Germany

Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage. We used the resulting algorithm to optimise a z-transformed rational function representing the control policy. This representation facilitates simultaneous optimisation of the control structure and its parameters in time space and can be analysed in terms of control theory to predict the control behaviour for arbitrary scenarios. Furthermore we present a solution to the general problem of finding a initial parameter set with the help of a single demonstrated trajectory. The approach is evaluated on a cartpole simulation for demonstrating the expressiveness of the policy. Furthermore, a real soccer robot scenario demonstrates the ability of the proposed approach to deal with real world scenarios.

关键词： Equations Estimation Trajectory Function approximation Mathematical model Transfer functions Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

Agent self-assessment: Determining policy quality without execution

Agent self-assessment: Determining policy quality without ex...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Alexander Hans Siegmund Duell Steffen Udluft Neuroinformatics and Cognitive Robotics Laboratory Ilmenau University of Technology Ilmenau Germany Machine Learning Group Berlin Institute of Technology Berlin Germany Intelligent Systems and Control Siemens AG Munich Germany

With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policy's quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value function's uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications.

关键词： Uncertainty Equations Markov processes Benchmark testing Approximation algorithms Machine learning Histograms

来源：评论

学校读者我要写书评

暂无评论

Active learning for personalizing treatment

Active learning for personalizing treatment

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Kun Deng Joelle Pineau Susan Murphy Department of Statistics University of Michigan USA Department of Computer Science McGill University Canada

The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains.

关键词： Recruitment Clinical trials learning systems Loss measurement Resource management Uncertainty Machine learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in multidimensional continuous action spaces

Reinforcement learning in multidimensional continuous action...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jason Pazis Michail G. Lagoudakis Department of Computer Science Duke University Durham NC USA Department of Electronic and Computer Engineering Technical University of Crete Crete Greece

The majority of learning algorithms available today focus on approximating the state (V ) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to real-world problems. In this paper we present an effective approach to learning and acting in domains with multidimensional and/or continuous control variables where efficient action selection is embedded in the learning process. Instead of learning and representing the state or state-action value function of the MDP, we learn a value function over an implied augmented MDP, where states represent collections of actions in the original MDP and transitions represent choices eliminating parts of the action space at each step. Action selection in the original MDP is reduced to a binary search by the agent in the transformed MDP, with computational complexity logarithmic in the number of actions, or equivalently linear in the number of action dimensions. Our method can be combined with any discrete-action reinforcement learning algorithm for learning multidimensional continuous-action policies using a state value approximator in the transformed MDP. Our preliminary results with two well-known reinforcement learning algorithms (Least-Squares Policy Iteration and Fitted Q-Iteration) on two continuous action domains (1-dimensional inverted pendulum regulator, 2-dimensional bicycle balancing) demonstrate the viability and the potential of the proposed approach.

关键词： Aerospace electronics Complexity theory Binary trees learning Vegetation Markov processes Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

ieee SSCI 2011 - symposium Series on Computational Intelligence - ieee ALIFE 2011: 2011 ieee symposium on Artificial Life

IEEE SSCI 2011 - Symposium Series on Computational Intellige...

引用

symposium Series on Computational Intelligence, ieee SSCI 2011 - 2011 ieee symposium on Artificial Life, ieee ALIFE 2011

ISBN: (纸本)9781612840635

The proceedings contain 30 papers. The topics discussed include: computation of population spatial distribution in individual-based ecosystem simulation;towards imitation-enhanced reinforcement learning in multi-agent systems;biologically inspired design principles for scalable, robust, adaptive, decentralized search and automated response (RADAR);look-ahead relevant information: reducing cognitive burden over prolonged tasks;information storage and transfer in the synchronization process in locally-connected networks;from babbling towards first words: the emergence of speech in a robot in real-time interaction;evolving robot controllers in PDL using genetic programming;ecosystemic methods for creative domains: niche construction and boundary formation;an interactive electronic art system based on artificial ecosystemics;network representation of cellular automata;and study of inheritable mutations in von Neumann self-reproducing automata using the GOLLY simulator.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Active exploration for robot parameter selection in episodic reinforcement learning

Active exploration for robot parameter selection in episodic...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Oliver Kroemer Jan Peters Max-Planck Institute Tubingen Germany

As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system's parameters and the rewards are given by the system's performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications.

关键词： Robots Ground penetrating radar Tuning Upper bound Kernel Grasping Convergence

来源：评论

学校读者我要写书评

暂无评论

Directed exploration of policy space using support vector classifiers

Directed exploration of policy space using support vector cl...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Ioannis Rexakis Michail G. Lagoudakis Department of Electronic and Computer Engineering Technical University of Crete Crete Greece

Good policies in reinforcement learning problems typically exhibit significant structure. Several recent learning approaches based on the approximate policy iteration scheme suggest the use of classifiers for capturing this structure and representing policies compactly. Nevertheless, the space of possible policies, even under such structured representations, is huge and needs to be explored carefully to avoid computationally expensive simulations (rollouts) needed to probe the improved policy and obtain training samples at various points over the state space. Regarding rollouts as a scarce resource, we propose a method for directed exploration of policy space using support vector classifiers. We use a collection of binary support vector classifiers to represent policies, whereby each of these classifiers corresponds to a single action and captures the parts of the state space where this action dominates over the other actions. After an initial training phase with rollouts uniformly distributed over the entire state space, we use the support vectors of the classifiers to identify the critical parts of the state space with boundaries between different action choices in the represented policy. The policy is subsequently improved by probing the state space only at points around the support vectors that are distributed perpendicularly to the separating border. This directed focus on critical parts of the state space iteratively leads to the gradual refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations with a conservative use of rollouts. We demonstrate the proposed approach on three standard reinforcement learning domains: inverted pendulum, mountain car, and acrobot.

关键词： Support vector machines Training learning Space exploration Probes Training data Markov processes

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共102页 << < 73 74 75 76 77 78 79 80 81 82 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：