This paper investigates the use of policygradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact...
详细信息
ISBN:
(纸本)9781479945528
This paper investigates the use of policygradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective Reinforcement-Learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.
Text-to-SQL is an important natural language processing task that helps users automatically convert natural language queries into formal SQL code. While transformer-based models have pushed text-to-SQL to unprecedente...
详细信息
Text-to-SQL is an important natural language processing task that helps users automatically convert natural language queries into formal SQL code. While transformer-based models have pushed text-to-SQL to unprecedented accuracy levels in recent years, such performance is confined to models of very large size that can only be run in specialised clouds. For this reason, in this paper we explore the use of reinforcement learning to improve the performance of models of more conservative size, which can fit within standard user hardware. As reinforcement learning reward, we propose a novel function which better aligns with the text-to-SQL evaluation metrics, applied in conjunction with two strong policygradientalgorithms, REINFORCE and RELAX. Our experimental results over the popular Spider benchmark show that the proposed approach has been able to outperform a conventionally-trained T5 Small baseline by 6.6 pp (percentage points) of exact-set-match accuracy and 4.6 pp of execution accuracy, and a T5 Base baseline by 2.0 pp and 1.9 pp, respectively. The proposed model has also achieved a remarkable comparative performance against ChatGPT instances.
Recently, improvements in sensing, communicating, and computing technologies have led to the development of driver-assistance systems (DASs). Such systems aim at helping drivers by either providing a warning to reduce...
详细信息
Recently, improvements in sensing, communicating, and computing technologies have led to the development of driver-assistance systems (DASs). Such systems aim at helping drivers by either providing a warning to reduce crashes or doing some of the control tasks to relieve a driver from repetitive and boring tasks. Thus, for example, adaptive cruise control (ACC) aims at relieving a driver from manually adjusting his/her speed to maintain a constant speed or a safe distance from the vehicle in front of him/her. Currently, ACC can be improved through vehicle-to-vehicle communication, where the current speed and acceleration of a vehicle can be transmitted to the following vehicles by intervehicle communication. This way, vehicle-to-vehicle communication with ACC can be combined in one single system called cooperative adaptive cruise control (CACC). This paper investigates CACC by proposing a novel approach for the design of autonomous vehicle controllers based on modern machine-learning techniques. More specifically, this paper shows how a reinforcement-learning approach can be used to develop controllers for the secure longitudinal following of a front vehicle. This approach uses function approximation techniques along with gradient-descent learning algorithms as a means of directly modifying a control policy to optimize its performance. The experimental results, through simulation, show that this design approach can result in efficient behavior for CACC.
暂无评论