检索结果-内蒙古大学图书馆

IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

作者： Parisi, Simone Pirotta, Matteo Smacchia, Nicola Bascetta, Luca Restelli, Marcello Politecn Milan Dept Elect Informat & Bioengn Piazza Leonardo da Vinci 32 I-20133 Milan Italy

ISBN: (纸本)9781479945528

This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective Reinforcement-Learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.

关键词： Pareto optimisation approximation theory decision making evolutionary computation gradient methods learning (artificial intelligence) MOMDPs MORL approaches Pareto following Pareto frontier approximation gradient-ascent algorithms gradient-based policy-search procedures multiobjective Markov decision processes multiobjective evolutionary algorithms multiobjective optimization problems multiobjective reinforcement-learning approaches multiobjective sequential decision making nondominated policies performance criterion policy gradient approaches policy-gradient algorithms radial following Algorithm design and analysis Approximation algorithms Approximation methods Manifolds Measurement Optimization Water resources evolutionary algorithm Performance metrics Pareto optimisation Algorithm design and analysis Manifolds Approximation method gradient methods Approximation Theory Approximation algorithms Water Resources Policies decision making

来源：评论

学校读者我要写书评

暂无评论

Fine-tuning text-to-SQL models with reinforcement-learning training objectives

Natural Language Processing Journal

引用

Natural Language Processing Journal 2025年 10卷

作者： Xuan-Bang Nguyen Xuan-Hieu Phan Massimo Piccardi University of Engineering and Technology Vietnam National University Hanoi Viet Nam FPT Technology Research Institute FPT University Hanoi Viet Nam Faculty of Engineering and Information Technology University of Technology Sydney Broadway NSW 2007 Australia

Text-to-SQL is an important natural language processing task that helps users automatically convert natural language queries into formal SQL code. While transformer-based models have pushed text-to-SQL to unprecedented accuracy levels in recent years, such performance is confined to models of very large size that can only be run in specialised clouds. For this reason, in this paper we explore the use of reinforcement learning to improve the performance of models of more conservative size, which can fit within standard user hardware. As reinforcement learning reward, we propose a novel function which better aligns with the text-to-SQL evaluation metrics, applied in conjunction with two strong policy gradient algorithms, REINFORCE and RELAX. Our experimental results over the popular Spider benchmark show that the proposed approach has been able to outperform a conventionally-trained T5 Small baseline by 6.6 pp (percentage points) of exact-set-match accuracy and 4.6 pp of execution accuracy, and a T5 Base baseline by 2.0 pp and 1.9 pp, respectively. The proposed model has also achieved a remarkable comparative performance against ChatGPT instances.

关键词： Text-to-SQL Reinforcement learning Reward functions policy-gradient algorithms Fine-tuning

来源：评论

学校读者我要写书评

暂无评论

Cooperative Adaptive Cruise Control: A Reinforcement Learning Approach

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2011年第4期12卷 1248-1260页

作者： Desjardins, Charles Chaib-draa, Brahim Univ Laval Dept Comp Sci & Software Engn Quebec City PQ G1K 7P4 Canada

Recently, improvements in sensing, communicating, and computing technologies have led to the development of driver-assistance systems (DASs). Such systems aim at helping drivers by either providing a warning to reduce crashes or doing some of the control tasks to relieve a driver from repetitive and boring tasks. Thus, for example, adaptive cruise control (ACC) aims at relieving a driver from manually adjusting his/her speed to maintain a constant speed or a safe distance from the vehicle in front of him/her. Currently, ACC can be improved through vehicle-to-vehicle communication, where the current speed and acceleration of a vehicle can be transmitted to the following vehicles by intervehicle communication. This way, vehicle-to-vehicle communication with ACC can be combined in one single system called cooperative adaptive cruise control (CACC). This paper investigates CACC by proposing a novel approach for the design of autonomous vehicle controllers based on modern machine-learning techniques. More specifically, this paper shows how a reinforcement-learning approach can be used to develop controllers for the secure longitudinal following of a front vehicle. This approach uses function approximation techniques along with gradient-descent learning algorithms as a means of directly modifying a control policy to optimize its performance. The experimental results, through simulation, show that this design approach can result in efficient behavior for CACC.

关键词： Autonomous vehicle control cooperative adaptive cruise control (CACC) neural networks policy-gradient algorithms reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：