检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

299 篇 会议
8 篇 期刊文献

馆藏范围

307 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

180 篇 工学
- 158 篇 计算机科学与技术...
- 56 篇 电气工程
- 48 篇 软件工程
- 47 篇 控制科学与工程
- 13 篇 信息与通信工程
- 10 篇 机械工程
- 6 篇 仪器科学与技术
- 4 篇 力学（可授工学、理...
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 2 篇 交通运输工程
- 2 篇 核科学与技术
- 2 篇 生物医学工程（可授...
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 航空宇航科学与技...
- 1 篇 食品科学与工程（可...
40 篇 理学
- 35 篇 数学
- 9 篇 系统科学
- 8 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 生物学
- 1 篇 化学
- 1 篇 天文学
- 1 篇 大气科学
- 1 篇 地球物理学
- 1 篇 地质学
18 篇 管理学
- 17 篇 管理科学与工程(可...
- 7 篇 工商管理
4 篇 经济学
- 4 篇 应用经济学
1 篇 医学

主题

115 篇 dynamic programm...
76 篇 reinforcement le...
67 篇 learning
47 篇 optimal control
30 篇 neural networks
27 篇 control systems
21 篇 approximate dyna...
21 篇 approximation al...
20 篇 function approxi...
20 篇 equations
17 篇 convergence
16 篇 adaptive dynamic...
16 篇 state-space meth...
16 篇 heuristic algori...
14 篇 mathematical mod...
13 篇 stochastic proce...
12 篇 learning (artifi...
12 篇 adaptive control
12 篇 cost function
11 篇 algorithm design...

机构

5 篇 arizona state un...
4 篇 department of el...
4 篇 school of inform...
4 篇 department of in...
4 篇 univ sci & techn...
4 篇 chinese acad sci...
4 篇 department of el...
3 篇 princeton univ d...
3 篇 northeastern uni...
3 篇 national science...
3 篇 robotics institu...
3 篇 univ illinois de...
3 篇 univ utrecht dep...
2 篇 univ groningen i...
2 篇 sharif univ tech...
2 篇 univ texas autom...
2 篇 pengcheng labora...
2 篇 guangxi univ sch...
2 篇 chinese acad sci...
2 篇 cemagref lisc au...

作者

14 篇 liu derong
9 篇 wei qinglai
8 篇 si jennie
7 篇 xu xin
5 篇 derong liu
4 篇 lewis frank l.
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 jennie si
4 篇 marco a. wiering
4 篇 xin xu
4 篇 zhang huaguang
4 篇 dongbin zhao
4 篇 lei yang
4 篇 powell warren b.
4 篇 riedmiller marti...
3 篇 hado van hasselt
3 篇 van hasselt hado
3 篇 jagannathan s.
3 篇 munos remi

语言

305 篇 英文
1 篇 其他
1 篇 中文

检索条件"任意字段=IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning"

共 307 条记录，以下是291-300 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Multigrid Methods for Policy Evaluation and reinforcement learning

Multigrid Methods for Policy Evaluation and Reinforcement Le...

引用

ieee international symposium on Intelligent Control (ISIC)

作者： O. Ziv N. Shimkin Department of Electrical Engineering Technion University Haifa Israel

We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov decision processes with linear functional approximation. The proposed scheme builds on the multi-grid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments

关键词： Multigrid methods learning Equations Convergence Iterative algorithms Function approximation dynamic programming Error correction State-space methods Computational complexity

来源：评论

学校读者我要写书评

暂无评论

On using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains

On using discretized Cohen-Grossberg node dynamics for model...

引用

ieee international symposium on Computational Intelligence in Robotics and Automation (CIRA)

作者： E. Mizutani S.E. Dreyfus Department of Computer Science National Tsing Hua University Hsinchu Taiwan Department of JEOR University of California Berkeley Berkeley CA USA

We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of Cohen-Grossberg node dynamics describes the node-activation computations of neural network (NN). Our NN is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning by how much the state apace must be augmented so that the Markov property holds. This serves as an alternative to using Elman or Jordan-type function as a history memory in order to develop sensitivity to non-Markovian dependencies. We shall demonstrate our concept using a small-scale non-Markovian deterministic path problem, in which our actor-critic NN finds an optimal sequence of actions, although it needs much iteration due to the nature of neural model-free learning. This is, in spirit, a neuro-dynamic programming approach.

关键词： Neural networks learning Recurrent neural networks History dynamic programming Neurons Signal processing Computer science Computer networks State-space methods

来源：评论

学校读者我要写书评

暂无评论

A biologically-inspired computational model for transformation invariant target recognition

A biologically-inspired computational model for transformati...

引用

international Joint Conference on Neural Networks (IJCNN)

作者： Khan M. Iftekharuddin Yaqin Li Intelligence System and Image Processing Lab Department of Electrical and Computer Engineering University of Memphis Memphis TN USA

Transformation invariant image recognition has been an active research area due to its widespread applications in a variety of fields such as military operations, robotics, medical practices, geographic scene analysis, and many others. One of the primary challenges is detection and recognition of objects in the presence of transformations such as resolution, rotation, translation, scale and occlusion. In this work, we investigate a biologically-inspired computational modeling approach that exploits reinforcement learning (RL) for transformation-invariant image recognition. The RL is implemented in an adaptive critic design (ACD) framework to approximate the neuro-dynamic programming. Two ACD algorithms such as heuristic dynamic programming (HDP) and dual heuristic dynamic programming (DHP) are investigated and compared for transformation invariant recognition. The two learning algorithms are evaluated statistically using simulated transformations in 2-D images as well as with a large-scale UMIST 2-D face database with pose variations. Our simulations show promising results for both HDP and DHP for transformation-invariant image recognition as well as face authentication. Comparing the two algorithms, DHP outperforms HDP in learning capability, as DHP takes fewer steps to perform a successful recognition task in general. On the other hand, HDP is more robust than DHP as far as success rate across the database is concerned when applied in a stochastic and uncertain environment, and the computational complexity involved in HDP is much less.

关键词： Artificial neural networks Visualization Biology Biological system modeling Image recognition dynamic programming Image resolution

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning for Perishable Inventory Optimization Problem

Deep Reinforcement Learning for Perishable Inventory Optimiz...

引用

ieee international Conference on Industrial Engineering and Engineering Management

作者： Yusuke Nomura Ziang Liu Tatsushi Nishi Graduate School of Environmental Life Natural Science and Technology Okayama University Okayama City Okayama Japan

While global attention on reducing food waste has increased, the demand for perishable commodities such as food and pharmaceuticals is growing. This emphasizes the need for effective perishable inventory management, which has become increasingly complex due to the perishability of these products. Traditional optimization methods, such as dynamic programming, require significant time and effort to solve these challenges. In this study, we use Deep Q-Network and Proximal Policy Optimization, which are deep reinforcement learning methods that can give numerical and approximate solutions to complex problems. In the inventory problem considering costs such as ordering, storage, lost opportunities, and spoilage, we define the inventory status as the state, the ordering as the action, and the negative total cost as the reward. We conducted a performance comparison of the two methods with an aligned total number of time steps. Furthermore, through numerical experiments, it was confirmed that the application of both methods resulted in a cost reduction of at least approximately 30% compared to the basic stock policy.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Resource Provisioning in Fog Computing through Deep reinforcement learning

Resource Provisioning in Fog Computing through Deep Reinforc...

引用

IFIP/ieee international symposium on Integrated Network Management

作者： José Santos Tim Wauters Bruno Volckaert Filip De Turck IDLab Ghent University - imec Gent Belgium

The massive growth of connected devices has made traditional cloud systems inadequate to sustain the scalability, mobility, and heterogeneous nature of the Internet of Things (oT). Distributed clouds have become a potential business opportunity for many service providers enabling the deployment of services on computational resources from the cloud up to the edge. However, challenges persist in fog-cloud infrastructures. One of them is known as Service Function Chaining (SFC), where providers benefit from network softwarization to create virtual chains of connected micro-services. Research has tackled SFC Allocation (SFCA) through theoretical modeling and heuristic algorithms, which often cannot cope with the dynamic behavior of the network. Recent works have addressed these challenges through Machine learning (ML), which can be capable of dynamically reconfiguring cloud-native service requirements over the continuum of virtual resources in next-generation networks. Thus, in this paper, a Deep reinforcement learning (DRL) approach is proposed for SFCA in Fog Computing focused on energy efficiency. Our agent learns about the best resource allocation decisions, focused on reducing costs from a previously presented Mixed-integer linear programming (MILP) formulation. Results show that our agent achieves comparable performance to state-of-the-art MILP formulations during dynamic use cases, obtaining 95% of request acceptance.

关键词： Training Radio frequency Cloud computing Service function chaining Heuristic algorithms Computational modeling Scalability

来源：评论

学校读者我要写书评

暂无评论

An Enhanced reinforcement learning Approach for dynamic Placement of Virtual Network Functions

An Enhanced Reinforcement Learning Approach for Dynamic Plac...

引用

ieee international symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

作者： Omar Houidi Oussama Soualah Wajdi Louati Djamal Zeghlache Telecom SudParis Samovar-UMR 5157 CNRS Institut Polytechnique de Paris France ReDCAD Lab University of Sfax Tunisia

ISBN: (数字)9781728144900

ISBN: (纸本)9781728144917

This paper addresses Virtualized Network Function Forwarding Graph (VNF-FG) embedding with the objective of realizing long term reward compared to placement algorithms that aim at instantaneous optimal placement. The long term reward is obtained using reinforcement learning (RL), following a Markov Decision Process (MDP) model, enhanced through the injection of expert knowledge in the learning process. A comparison with an Integer Linear programming (ILP) approach, a reduced candidate set (R-ILP), and an algorithm that treats the requests in batch reveals the potential improvements using the RL approach. The instantaneous and short term reward solutions are efficient only in finding instant solutions as they make decisions only on current infrastructure status for a given request at a time or eventually a batch of requests. They are efficient only for present conditions without anticipating future requests. RL possesses instead the learning and anticipation capabilities lacking in instantaneous and snapshot optimizations. A reinforcement learning based approach, called EQL (Enhanced Q-learning), aiming at balancing the load on hosting infrastructures is proposed to achieve the desired longer term reward. EQL employs RL to learn the network and control it based on the usage patterns of the physical resources. Results from extensive simulations, based on realistic and large scale topologies, report the superior performance of EQL in terms of acceptance rate, quality, scalability and achieved gains.

关键词： Bandwidth learning (artificial intelligence) Servers Switches Optimization Linear programming Land mobile radio

来源：评论

学校读者我要写书评

暂无评论

ATM: approximate Task Memoization in the Runtime System

ATM: Approximate Task Memoization in the Runtime System

引用

international symposium on Parallel and Distributed Processing (IPDPS)

作者： Iulian Brumar Marc Casas Miquel Moreto Mateo Valero Gurindar S. Sohi Barcelona Supercomputing Center (BSC) Barcelona Spain University of Wisconsin-Madison USA

Redundant computations appear during the execution of real programs. Multiple factors contribute to these unnecessary computations, such as repetitive inputs and patterns, calling functions with the same parameters or bad programming habits. Compilers minimize non useful code with static analysis. However, redundant execution might be dynamic and there are no current approaches to reduce these inefficiencies. Additionally, many algorithms can be computed with different levels of accuracy. approximate computing exploits this fact to reduce execution time at the cost of slightly less accurate results. In this case, expert developers determine the desired tradeoff between performance and accuracy for each application. In this paper, we present approximate Task Memoization (ATM), a novel approach in the runtime system that transparently exploits both dynamic redundancy and approximation at the task granularity of a parallel application. Memoization of previous task executions allows predicting the results of future tasks without having to execute them and without losing accuracy. To further increase performance improvements, the runtime system can memoize similar tasks, which leads to task approximate computing. By defining how to measure task similarity and correctness, we present an adaptive algorithm in the runtime system that automatically decides if task approximation is beneficial or not. When evaluated on a real 8-core processor with applications from different domains (financial analysis, stencil-computation, machine-learning and linear-algebra), ATM achieves a 1.4x average speedup when only applying memoization techniques. When adding task approximation, ATM achieves a 2.5x average speedup with an average 0.7% accuracy loss (maximum of 3.2%).

关键词： Runtime programming approximate computing History Redundancy Data structures Parallel processing

来源：评论

学校读者我要写书评

暂无评论

Cooperative learning and planning for multiple robots

Cooperative learning and planning for multiple robots

引用

ieee international symposium on Intelligent Control (ISIC)

作者： S. van der Zwaan J.A.A. Moreira P.U. Lima Inst. de Sistema e Robotica Inst. Superior Tecnico Lisbon Portugal IInstituto de Sistemas e Robótica Instituto Superior Técnico Lisboa Portugal Instituto de Sistemas e Robótica Instituto Superior Técnico Lisboa Portugal

The paper deals with the the subject of learning and planning for real mobile robots, using Sutton's (1991) Dyna algorithm. The Dyna algorithm integrates reinforcement learning, planning and reactive execution. We present an extension of the Dyna algorithm which includes symmetric and cooperative learning with multiple robots. We applied the extended version of the algorithm to a population of two real robots. Practical problems associated with the implementation of the algorithm on a real setup are solved. Results obtained from simulations and real experiments are presented and discussed.

关键词： learning Mobile robots Orbital robotics State-space methods Density estimation robust algorithm Robot control Control systems dynamic programming Costs Navigation

来源：评论

学校读者我要写书评

暂无评论

Mobile-Aware Online Task Offloading Based on Deep reinforcement learning in Mobile Edge Computing Networks

Mobile-Aware Online Task Offloading Based on Deep Reinforcem...

引用

ieee international symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

作者： Yuting Li Yitong Liu Xingcheng Liu Qiang Tu Yi Xie School of Electronics and Information Technology Sun Yat-sen University Guangzhou China School of Computer Science and Engineering Sun Yat-sen University Guangzhou China Jiangsu Viscore Technologies Co. Ltd. Suzhou China

Mobile Edge Computing (MEC) is one of the key enabling technologies for future 6G wireless networks that can provide lower latency service and more efficient resource utilization for future intelligent applications and the Internet of Things (IoT), while also reducing the energy consumption of end devices. In the intricate dynamic edge environment, the task offloading problem is entangled with several factors, such as the uncertainty of online tasks, the heterogeneity of edge servers, and the mobility of devices. In this paper, considering the randomness of online task arrivals, time-varying channels, and mobility of devices, a deep reinforcement learning-based online task offloading (DRL-OTO) algorithm is designed to minimize the energy consumption of all mobile devices. Specifically, by portraying the system model consisting of the communication model, energy consumption model, and node mobility model, the task offloading optimization problem is modeled as a mixed integer nonlinear programming (MINLP) problem. By decomposing this problem, each mobile device first determines the edge server to be offloaded, and then the DRL-OTO algorithm is designed by utilizing the DDPG method, in which each mobile device is able to determine the offloading rate. Simulation results show that the proposed DRL-OTO algorithm can achieve fast convergence and is able to reduce energy consumption, thus increasing the utility of all devices in the dynamic edge environment.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Budget-aware Incentive Mechanism for Vehicle-to-Grid via reinforcement learning

A Budget-aware Incentive Mechanism for Vehicle-to-Grid via R...

引用

international Workshop on Quality of Service

作者： Tianxiang Zhu Xiaoxi Zhang Jingpu Duan Zhi Zhou Xu Chen Sun Yat-sen University Guangzhou China Southern University of Science and Technology Shenzhen China Pengcheng Laboratory Shenzhen China

With the increasing penetration of renewable energy and electric vehicles (EVs), the behavior of EVs' charging and discharging has shown great impact on the Micro Grid power load, motivating the development of Vehicle-to-Grid (V2G) technologies. However, the V2G market is still in its infancy, due to insufficient understanding of EV users' willingness and concerns. While many studies consider direct EV control, it's more realistic to indirectly affect users' behavior through monetary incentives. For better implementation flexibility, we advocate to display at charging piles strategically chosen incentives that are combined with electricity prices. Technically, this is the first model-free learning algorithm that can optimize incentives under unknown EV user reactions, increase the load control effectiveness and users' quality-of-service (QoS) simultaneously under a long-term incentive budget, and provide theoretical performance guarantees. We first construct a bi-level optimization framework to model the time-dependencies across our solutions. We then integrate primal-dual theories and upper-confidence bounds into reinforcement learning to balance power control and incentive consumption. A dynamic programming based algorithm is also proposed to maximize the aggregate user QoS. Finally, we prove bounded sub-optimality of our learning algorithm through theoretical analysis and conduct trace-driven simulations to demonstrate the advantages of our bi-level framework.

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共31页 << < 22 23 24 25 26 27 28 29 30 31 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：