检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

299 篇 会议
8 篇 期刊文献

馆藏范围

307 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

180 篇 工学
- 158 篇 计算机科学与技术...
- 56 篇 电气工程
- 48 篇 软件工程
- 47 篇 控制科学与工程
- 13 篇 信息与通信工程
- 10 篇 机械工程
- 6 篇 仪器科学与技术
- 4 篇 力学（可授工学、理...
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 2 篇 交通运输工程
- 2 篇 核科学与技术
- 2 篇 生物医学工程（可授...
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 航空宇航科学与技...
- 1 篇 食品科学与工程（可...
40 篇 理学
- 35 篇 数学
- 9 篇 系统科学
- 8 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 生物学
- 1 篇 化学
- 1 篇 天文学
- 1 篇 大气科学
- 1 篇 地球物理学
- 1 篇 地质学
18 篇 管理学
- 17 篇 管理科学与工程(可...
- 7 篇 工商管理
4 篇 经济学
- 4 篇 应用经济学
1 篇 医学

主题

115 篇 dynamic programm...
76 篇 reinforcement le...
67 篇 learning
47 篇 optimal control
30 篇 neural networks
27 篇 control systems
21 篇 approximate dyna...
21 篇 approximation al...
20 篇 function approxi...
20 篇 equations
17 篇 convergence
16 篇 adaptive dynamic...
16 篇 state-space meth...
16 篇 heuristic algori...
14 篇 mathematical mod...
13 篇 stochastic proce...
12 篇 learning (artifi...
12 篇 adaptive control
12 篇 cost function
11 篇 algorithm design...

机构

5 篇 arizona state un...
4 篇 department of el...
4 篇 school of inform...
4 篇 department of in...
4 篇 univ sci & techn...
4 篇 chinese acad sci...
4 篇 department of el...
3 篇 princeton univ d...
3 篇 northeastern uni...
3 篇 national science...
3 篇 robotics institu...
3 篇 univ illinois de...
3 篇 univ utrecht dep...
2 篇 univ groningen i...
2 篇 sharif univ tech...
2 篇 univ texas autom...
2 篇 pengcheng labora...
2 篇 guangxi univ sch...
2 篇 chinese acad sci...
2 篇 cemagref lisc au...

作者

14 篇 liu derong
9 篇 wei qinglai
8 篇 si jennie
7 篇 xu xin
5 篇 derong liu
4 篇 lewis frank l.
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 jennie si
4 篇 marco a. wiering
4 篇 xin xu
4 篇 zhang huaguang
4 篇 dongbin zhao
4 篇 lei yang
4 篇 powell warren b.
4 篇 riedmiller marti...
3 篇 hado van hasselt
3 篇 van hasselt hado
3 篇 jagannathan s.
3 篇 munos remi

语言

305 篇 英文
1 篇 其他
1 篇 中文

检索条件"任意字段=IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning"

共 307 条记录，以下是171-180 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Policy Iteration Algorithm for Constrained Cost Optimal Control of Discrete-Time Nonlinear System

Policy Iteration Algorithm for Constrained Cost Optimal Cont...

引用

international Joint Conference on Neural Networks (IJCNN)

作者： Li, Tao Wei, Qinglai Li, Hongyang Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing Peoples R China Univ Sci & Technol Beijing Sch Automat Beijing Peoples R China

ISBN: (纸本)9780738133669

In this paper, optimal control problems with constraints on summation of auxiliary utility function are called constrained cost optimal control problems and a constrained cost policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve constrained cost optimal control problems for discrete-time nonlinear systems. A convergence analysis is developed to guarantee that the iterative value functions non-increasingly convergent to the approximate optimal value function. It is also proven that any of the iterative control policy is feasible and can stabilize the nonlinear systems. Finally, a simulation example is given to illustrate the performance of the developed constrained cost policy iteration algorithm.

关键词： Adaptive dynamic programming (ADP) reinforcement learning constrained cost optimal control policy iteration

来源：评论

学校读者我要写书评

暂无评论

Efficient learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem

Efficient Learning in Cellular Simultaneous Recurrent Neural...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Roman Ilin Robert Kozma Paul J. Werbos Department of Mathematical Sciences University of Memphis Memphis TN USA National Science Foundation Arlington VA USA

Cellular simultaneous recurrent neural networks (SRN) show great promise in solving complex function approximation problems. In particular, approximate dynamic programming is an important application area where SRNs have significant potential advantages compared to other approximation methods. learning in SRNs, however, proved to be a notoriously difficult problem, which prevented their broader use. This paper introduces an extended Kalman filter approach to train SRNs. Using the two-dimensional maze navigation problem as a testbed, we illustrate the operation of the method and demonstrate its benefits in generalization and testing performance

关键词： Cellular networks Recurrent neural networks Motion planning Function approximation dynamic programming Electronic mail Testing Cost function Equations Feedforward systems

来源：评论

学校读者我要写书评

暂无评论

DATE: Disturbance-Aware Traffic Engineering with reinforcement learning in Software-Defined Networks 29

DATE: Disturbance-Aware Traffic Engineering with Reinforceme...

引用

29th ieee/ACM international symposium on Quality of Service (IWQOS)

作者： Ye, Minghao Zhang, Junjie Guo, Zehua Chao, H. Jonathan NYU Dept Elect & Comp Engn New York NY 11201 USA Fortinet Inc Sunnyvale CA 94086 USA Beijing Inst Technol Beijing 100081 Peoples R China

ISBN: (纸本)9781665414944

Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essential to reroute flows more frequently than today's TE to maintain network performance. However, existing TE solutions may introduce considerable Quality of Service (QoS) degradation and service disruption since they do not take the potential negative impact of flow rerouting into account. In this paper, we apply a new QoS metric named network disturbance to gauge the impact of flow rerouting while optimizing network load balancing in backbone networks. To employ this metric in TE design, we propose a disturbance-aware TE called DATE, which uses reinforcement learning (RL) to intelligently select some critical flows between nodes for each traffic matrix and reroute them using Linear programming (LP) to jointly optimize network performance and disturbance. DATE is equipped with a customized actor-critic architecture and Graph Neural Networks (GNNs) to handle dynamic traffic and single link failures. Extensive evaluations show that DATE can outperform state-of-the-art TE methods with close-to-optimal load balancing performance while effectively mitigating the 99th percentile network disturbance by up to 31.6%.

关键词： Traffic Engineering Software-Defined Networking reinforcement learning Routing Network Disturbance Link Failure

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning based finite-horizon optimal tracking control for nonlinear system

Deep reinforcement learning based finite-horizon optimal tra...

引用

Joint Meeting of the 2nd IFAC Workshop on Linear Parameter Varying Systems (LPVS) / 9th IFAC symposium on Robust Control Design (ROCOND)

作者： Kim, Jong Woo Park, Byung Jun Yoo, Haeun Lee, Jay H. Lee, Jong Min Seoul Natl Univ Inst Chem Proc Sch Chem & Biol Engn 1 Gwanak Ro Seoul 08826 South Korea Korea Adv Inst Sci & Technol Chem & Biomol Engn Dept Daejeon 3041 South Korea

reinforcement learning (RL) can be used to obtain an approximate numerical solution to the Hamilton-Jacobi-Bellman (HJB) equation. Recent advances in machine learning community enable the use of deep neural networks (DNNs) to approximate high-dimensional nonlinear functions as those that occur in RL, accurately without any domain knowledge. In the standard RL setting, both system and cost structures are unknown, and the amount of data needed to obtain an accurate approximation can be impractically large. Meanwhile, when the structures are known, they can be used to solve the HJB equation efficiently. Herein, the model based globalized dual heuristic programming (GDHP) is proposed, in which the HJB equation is separated into value, costate, and policy functions. A particular class of interest in this research is finite horizon optimal tracking control (FHOC) problem. Additional issues that arise, such as time-varying functions, terminal constraints, and delta-input formulation, are addressed in the context of FHOC. The DNN structure and training algorithm suitable for FHOC are presented. A benchmark continuous reactor example is provided to illustrate the proposed approach. (C) 2018, IFAC (international Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： reinforcement learning approximate dynamic programming Deep learning Globalized dual heuristic programming Optimal control Optimal tracking

来源：评论

学校读者我要写书评

暂无评论

An Optimal ADP Algorithm for a High-Dimensional Stochastic Control Problem

An Optimal ADP Algorithm for a High-Dimensional Stochastic C...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Juliana Nascimento Warren Powell Department of Operations Research and Financial Engineering Princeton University Engineering Princeton NJ USA

We propose a provably optimal approximate dynamic programming algorithm for a class of multistage stochastic problems, taking into account that the probability distribution of the underlying stochastic process is not known and the state space is too large to be explored entirely. The algorithm and its proof of convergence rely on the fact that the optimal value functions of the problems within the problem class are concave and piecewise linear. The algorithm is a combination of Monte Carlo simulation, pure exploitation, stochastic approximation and a projection operation. Several applications, in areas like energy, control, inventory and finance, fall under the framework

关键词： Stochastic processes Optimal control Piecewise linear approximation dynamic programming Heuristic algorithms Probability distribution State-space methods Convergence Piecewise linear techniques Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

***: Power-Aware Traffic Engineering via Deep reinforcement learning 29

***: Power-Aware Traffic Engineering via Deep Reinforcement ...

引用

29th ieee/ACM international symposium on Quality of Service (IWQOS)

作者： Pan, Tian Peng, Xiaoyu Shi, Qianqian Bian, Zizheng Lin, Xingchen Song, Enge Li, Fuliang Xu, Yang Huang, Tao BUPT State Key Lab Networking & Switching Technol Beijing Peoples R China Sci & Technol Commun Networks Lab Shijiazhuang Hebei Peoples R China Northeastern Univ Shenyang Liaoning Peoples R China Fudan Univ Shanghai Peoples R China

ISBN: (纸本)9781665414944

Power-aware traffic engineering via coordinated sleeping is usually formulated into Integer programming problems, which are generally NP-hard with unbounded computation time for large-scale networks. This results in delayed control decision making in dynamic network environments. Motivated by advances in deep reinforcement learning, we consider building intelligent systems that learn to adaptively change router/switch's power state according to changing network conditions. Neural network's forward propagation can greatly speed up power on/off decision making. Generally, conducting RL requires a learning agent to iteratively explore and perform the "good" actions based on the feedback from the environment. By coupling Software-Defined Networking for performing centrally calculated actions to the environment and In-band Network Telemetry for collecting feedback from the environment, we develop ***, a closed-loop control/training system to automate power-aware traffic engineering. Furthermore, we propose novel techniques to enhance the learning ability and reduce the learning complexity. With both energy efficiency and traffic load balancing considered, *** can generate reasonable power saving actions within 276ms under a network testbed of 11 software P4 switches.

关键词： Green products Decision making reinforcement learning Telecommunication traffic Quality of service Control systems Energy efficiency

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning for Perishable Inventory Optimization Problem

Deep Reinforcement Learning for Perishable Inventory Optimiz...

引用

2023 ieee international Conference on Industrial Engineering and Engineering Management, IEEM 2023

作者： Nomura, Yusuke Liu, Ziang Nishi, Tatsushi Graduate School of Environmental Life Natural Science and Technology Okayama University 3-1-1 Tsushima-Naka Kita-ku Okayama Okayama City Japan

ISBN: (纸本)9798350323153

While global attention on reducing food waste has increased, the demand for perishable commodities such as food and pharmaceuticals is growing. This emphasizes the need for effective perishable inventory management, which has become increasingly complex due to the perishability of these products. Traditional optimization methods, such as dynamic programming, require significant time and effort to solve these challenges. In this study, we use Deep Q-Network and Proximal Policy Optimization, which are deep reinforcement learning methods that can give numerical and approximate solutions to complex problems. In the inventory problem considering costs such as ordering, storage, lost opportunities, and spoilage, we define the inventory status as the state, the ordering as the action, and the negative total cost as the reward. We conducted a performance comparison of the two methods with an aligned total number of time steps. Furthermore, through numerical experiments, it was confirmed that the application of both methods resulted in a cost reduction of at least approximately 30% compared to the basic stock policy. © 2023 ieee.

关键词： inventory management machine learning perishable inventory reinforcement learning supply chain

来源：评论

学校读者我要写书评

暂无评论

Model-Based reinforcement learning in Factored-State MDPs

Model-Based Reinforcement Learning in Factored-State MDPs

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Alexander L. Strehl Department of Computer Science Rutgers University Piscataway NJ USA

We consider the problem of learning in a factored-state Markov decision process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Roller for their DBN-E 3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the interval estimation approach to exploration and can be expected to outperform factored Rmax on most domains

关键词： learning Polynomials Algorithm design and analysis State-space methods dynamic programming Computer science Mathematical model Performance analysis Bayesian methods Linear approximation

来源：评论

学校读者我要写书评

暂无评论

Adaptive critic-based neurofuzzy controller for the steam generator water level

Adaptive critic-based neurofuzzy controller for the steam ge...

引用

15th international Workshop on Room-Temperature Semiconductor X- and Gamma-Ray Detectors/ 2006 ieee Nuclear Science symposium

作者： Fakhrazari, Amin Boroushaki, Mehrdad Sharif Univ Technol Dept Mech Engn Tehran Iran

In this paper, an adaptive critic-based neurofuzzy controller is presented for water level regulation of nuclear steam generators. The problem has been of great concern for many years as the steam generator is a highly nonlinear system showing inverse response dynamics especially at low operating power levels. Fuzzy critic-based learning is a reinforcement learning method based on dynamic programming. The only information available for the critic agent is the system feedback which is interpreted as the last action the controller has performed in the previous state. The signal produced by the critic agent is used alongside the backpropagation of error algorithm to tune online conclusion parts of the fuzzy inference rules. The critic agent here has a proportional-derivative structure and the fuzzy rule base has nine rules. The proposed controller shows satisfactory transient responses, disturbance rejection and robustness to model uncertainty. Its simple design procedure and structure, nominates it as one of the suitable controller designs for the steam generator water level control in nuclear power plant industry.

关键词： adaptive critic-based design fuzzy logic reinforcement learning vertical U-tube steam generator

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming Approach for Job Releasing and Sequencing in a Reentrant Manufacturing Line

An Approximate Dynamic Programming Approach for Job Releasin...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jose A. Ramirez-Hernandez Emmanuel Fernandez Department of Electrical & Computer Engineering University of Cincinnati OH USA

This paper presents the application of an approximate dynamic programming (ADP) algorithm to the problem of job releasing and sequencing of a benchmark reentrant manufacturing line (RML). The ADP approach is based on the SARSA(lambda) algorithm with linear approximation structures that are tuned through a gradient-descent approach. The optimization is performed according to a discounted cost criterion that seeks both the minimization of inventory costs and the maximization of throughput. Simulation experiments are performed by using different approximation architectures to compare the performance of optimal strategies against policies obtained with ADP. Results from these experiments showed a statistical match in performance between the optimal and the approximated policies obtained through ADP. Such results also suggest that the applicability of the ADP algorithm presented in this paper may be a promising approach for larger RML systems

关键词： dynamic programming Control systems Workstations Pulp manufacturing Cost function Optimal control Manufacturing industries Fabrication Semiconductor devices Manufacturing processes

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共31页 << < 14 15 16 17 18 19 20 21 22 23 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：