检索结果-内蒙古大学图书馆

2nd Annual Conference on Learning for dynamics and Control (L4DC)

作者： Menta, Sandeep Warrington, Joseph Lygeros, John Morari, Manfred Swiss Fed Inst Technol Automat Control Lab Phys Str 3 CH-8092 Zurich Switzerland Univ Penn Elect & Syst Engn 220 S 33rd St Philadelphia PA 19104 USA

Hybrid control problems are complicated by the need to make a suitable sequence of discrete decisions related to future modes of operation of the system. Model predictive control (MPC) encodes a finite-horizon truncation of such problems as a mixed-integer program, and then imposes a cost and/or constraints on the terminal state intended to reflect all post-horizon behaviour. However, these are often ad hoc choices tuned by hand after empirically observing performance. We present a learning method that sidesteps this problem, in which the so-called N-step Q-function of the problem is approximated from below, based on experience evaluating the policy. The function takes a state and a sequence of N control decisions as arguments, and therefore extends the traditional notion of a Q-function from reinforcement learning. After learning it from a training process exploring the state-input space, we use it in place of the usual MPC objective. We take an example hybrid control task and show that it can be completed successfully with a shorter planning horizon than conventional hybrid MPC thanks to our proposed method. Furthermore, we report that Q-functions trained with long horizons can be truncated to a shorter horizon for online use, yielding simpler control laws with apparently little loss of performance.

关键词： Hybrid systems reinforcement learning approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Lambda-Policy Iteration with Randomization for Contractive Models with Infinite Policies: Well-Posedness and Convergence 2

Lambda-Policy Iteration with Randomization for Contractive M...

引用

2nd Annual Conference on Learning for dynamics and Control (L4DC)

作者： Li, Yuchao Johansson, Karl H. Martensson, Jonas KTH Royal Inst Technol Div Decis & Control Syst Stockholm Sweden

dynamic programming models are used to analyze lambda-policy iteration with randomization algorithms. Particularly, contractive models with infinite policies are considered and it is shown that well-posedness of the lambda-operator plays a central role in the algorithm. The operator is known to be well-posed for problems with finite states, but our analysis shows that it is also well-defined for the contractive models with infinite states studied. Similarly, the algorithm we analyze is known to converge for problems with finite policies, but we identify the conditions required to guarantee convergence with probability one when the policy space is infinite regardless of the number of states. Guided by the analysis, we exemplify a data-driven approximated implementation of the algorithm for estimation of optimal costs of constrained linear and nonlinear control problems. Numerical results indicate potentials of this method in practice.

关键词： lambda-policy iteration approximate dynamic programming reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Advances in Tactical & Operational Planning for Less-than-Truckload Carriers

Advances in Tactical & Operational Planning for Less-than-Tr...

引用

作者： Baubaid, Ahmad Ali Georgia Institute of Technology

学位级别：博士

This thesis explores tactical and operational planning problems in the context of the Less-than-Truckload (LTL) industry. LTL carriers transport shipments that occupy a small fraction of trailer capacity, and, thus, rely on the consolidation of freight from multiple shippers to achieve economies of scale. The first part of this thesis focuses on tactical planning operations of LTL carriers. In particular, in Chapter 2, we study the service network design problem confronted by LTL carriers ahead of an operating season. This problem includes determining: (1) the number of services (trailers) to operate between each pair of terminals, and (2) a load plan which specifies the sequence of transfer terminals that freight with a given origin and destination will visit. Traditionally, for every terminal and every ultimate destination, a load plan specifies a unique next terminal. We introduce the p-alt model, which generalizes traditional load plans by allowing decision-makers to specify a desired number of next terminal options for terminal-destination pairs using a vector p. We compare a number of exact and heuristic approaches for solving a two-stage stochastic variant of the p-alt model. Using this model, we show that by explicitly considering demand uncertainty and by merely allowing up to two next terminal options for terminal-destination pairs in the load plans, carriers can generate substantial cost savings; cost savings that are comparable to those yielded by adopting load plans that allow for any next terminal to be a routing option for terminal-destination pairs. Moreover, by using these more flexible load plans, carriers can generate cost savings in the order of 10% over traditional load plan designs obtained by deterministic models. The second part of the thesis shifts to an operational setting relating to how freight is routed through the carrier's service network. As the daily freight quantities handled by a carrier are uncertain, freight routes are dynamica

关键词： Freight transportation Less-than-truckload Load planning approximate dynamic programming Value function approximation

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Control of Unknown Systems: A Linear programming Approach 21st

Data-Driven Control of Unknown Systems: A Linear Programming...

引用

21st IFAC World Congress on Automatic Control - Meeting Societal Challenges

作者： Tanzanakis, Alexandros Lygeros, John Swiss Fed Inst Technol Dept Informat Technol & Elect Engn Zurich Switzerland

We consider the problem of discounted optimal state-feedback regulation for general unknown deterministic discrete-time systems. It is well known that open-loop instability of systems, non-quadratic cost functions and complex nonlinear dynamics, as well as the on-policy behavior of many reinforcement learning (RL) algorithms, make the design of model-free optimal adaptive controllers a challenging task. We depart from commonly used least-squares and neural network approximation methods in conventional model-free control theory, and propose a novel family of data-driven optimization algorithms based on linear programming, off-policy Q-learning and randomized experience replay. We develop both policy iteration (PI) and value iteration (VI) methods to compute an approximate optimal feedback controller with high precision and without the knowledge of a system model and stage cost function. Simulation studies confirm the effectiveness of the proposed methods. Copyright (C) 2020 The Authors.

关键词： linear programming Q-learning approximate dynamic programming data-driven control

来源：评论

学校读者我要写书评

暂无评论

Comparison of Manual and Automated Decision-Making with a Logistics Serious Game 11th

Comparison of Manual and Automated Decision-Making with a Lo...

引用

11th International Conference on Computational Logistics (ICCL)

作者： Mes, Martijn van Heeswijk, Wouter Univ Twente Dept Ind Engn & Business Informat Syst Enschede Netherlands

ISBN: (纸本)9783030597474;9783030597467

This paper presents a logistics serious game that describes an anticipatory planning problem for the dispatching of trucks, barges, and trains, considering uncertainty in future container arrivals. The problem setting is conceptually easy to grasp, yet difficult to solve optimally. For this problem, we deploy a variety of benchmark algorithms, including two heuristics and two reinforcement learning implementations. We use the serious game to compare the manual performance of human decision makers with those algorithms. Furthermore, the game allows humans to create their own automated planning rules, which can also be compared with the implemented algorithms and manual game play. To illustrate the potential use of the game, we report the results of three gaming sessions: with students, with job seekers, and with logistics professionals. The experimental results show that reinforcement learning typically outperforms the human decision makers, but that the top tier of humans come very close to this algorithmic performance.

关键词： Intermodal transport Serious gaming Reinforcement learning approximate dynamic programming Heuristics

来源：评论

学校读者我要写书评

暂无评论

Optimizing Trading Decisions for Hydro Storage Systems Using approximate Dual dynamic programming

引用

OPERATIONS RESEARCH 2013年第4期61卷 810-823页

作者： Loehndorf, Nils Wozabal, David Minner, Stefan Vienna Univ Econ & Business A-1020 Vienna Austria Tech Univ Munich D-80333 Munich Germany

We propose a new approach to optimize operations of hydro storage systems with multiple connected reservoirs whose operators participate in wholesale electricity markets. Our formulation integrates short-term intraday with long-term interday decisions. The intraday problem considers bidding decisions as well as storage operation during the day and is formulated as a stochastic program. The interday problem is modeled as a Markov decision process of managing storage operation over time, for which we propose integrating stochastic dual dynamic programming with approximate dynamic programming. We show that the approximate solution converges toward an upper bound of the optimal solution. To demonstrate the efficiency of the solution approach, we fit an econometric model to actual price and inflow data and apply the approach to a case study of an existing hydro storage system. Our results indicate that the approach is tractable for a real-world application and that the gap between theoretical upper and a simulated lower bound decreases sufficiently fast.

关键词： approximate dynamic programming Markov decision processes OR in energy stochastic programming

来源：评论

学校读者我要写书评

暂无评论

The multi-vehicle stochastic-dynamic inventory routing problem for bike sharing systems

引用

Business Research 2020年第1期13卷 69-92页

作者： Brinkmann, Jan Ulmer, Marlin W. Mattfeld, Dirk C. Institut für Wirtschaftsinformatik Technische Universität Braunschweig Mühlenpfordtstraße 23 Braunschweig 38106 Germany

We address the operational management of station-based bike sharing systems (BSSs). In BSSs, users can spontaneously rent and return bikes at any stations in the system. Demand is driven by commuter, shopping, and leisure activities. This demand constitutes a regular pattern of bike usage over the course of the day but also shows a significant short-term uncertainty. Due to the heterogeneity and the uncertainty in demand, stations may run out of bikes or congest during the day. At empty stations, no rental demand can be served. At full stations, no return demand can be served. To avoid unsatisfied demand, providers dynamically relocate bikes between stations in reaction of current shortages or congestion, but also in anticipation of potential future demand. For this real-time decision problem, we present a method that anticipates potential future demands based on historical observations and that coordinates the fleet of vehicles accordingly. We apply our method for two case studies based on real-world data of the BSSs in Minneapolis and San Francisco. We show that our policy outperforms benchmark policies from the literature. Moreover, we analyze how the interplay between anticipation and coordination is essential for the successful operational management of BSSs. Finally, we reveal that the value of coordination and anticipation based on the demand-structure of the BSS under consideration. © 2019, The Author(s).

关键词： approximate dynamic programming Bike sharing dynamic vehicle routing Inventory routing

来源：评论

学校读者我要写书评

暂无评论

Large-Scale Optimization for Green Logistics and Stochastic Resource Allocation for Food Security

Large-Scale Optimization for Green Logistics and Stochastic ...

引用

作者： Alkhannan Alkaabneh, Faisal M. M. Cornell University

学位级别：Ph.D., Doctor of Philosophy

We studied several important management and policy analysis problems in food supply chain systems utilizing large-scale optimization, stochastic resource allocation, and data-analytics methodologies. We focused on three main research questions: 1) How can retailers build green, efficient last-mile logistics system when the objective is to maximize their profit and minimize the costs due to fuel consumption, inventory holding, and greenhouse gas emissions (Chapter 2); 2) what is the best environmental intervention policy to reduce the environmental externalities associated with the production of fruits and vegetables considering environmental and economic dimensions simultaneously (Chapter 3); and (3) How can food banks better manage food supplies distribution to combat food insecurity of underserved population (Chapters 4 & 5). Specifically, we have explored the following four dimensions in food supply chains 1) Benders decomposition for the inventory vehicle routing problem with perishable products and environmental costs. We consider the problem of inventory routing in the context of perishable products and find near-optimal replenishment scheduling and vehicle routes. To solve the problem efficiently, we develop an exact method based on Benders decomposition to find high-quality solutions in reasonable time and a two-stage meta-heuristic. 2) A systems approach to carbon policy for fruit supply chains: carbon tax, technology innovation, or land sparing? Reducing carbon emissions of food supply chains has increasingly received attention from businesses and policymakers. In order to propose sound policies aimed at lowering such emissions, policy makers favor tools that are informative in the economic and environmental dimensions simultaneously. In this study we offer a systems-based approach which is intended to do just that by developing a spatially and temporally disaggregated price equilibrium mathematical model for a food production and distribution system and a

关键词： approximate dynamic programming Data analytics Food banks Large-scale optimization Supply chain

来源：评论

学校读者我要写书评

暂无评论

Learning Convex Optimization Control Policies 2

Learning Convex Optimization Control Policies

引用

2nd Annual Conference on Learning for dynamics and Control (L4DC)

作者： Agrawal, Akshay Barratt, Shane Boyd, Stephen Stellato, Bartolomeo 450 Serra Mall Stanford CA 94305 USA 77 Massachusetts Ave Cambridge MA 02139 USA

Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex program with respect to its parameters. A longer version of this paper, which illustrates our method on many examples, is available at https://***/similar to boyd/papers/learning_***.

关键词： Stochastic control convex optimization approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

UAV Formation Shape Control via Decentralized Markov Decision Processes

引用

ALGORITHMS 2021年第3期14卷 91页

作者： Azam, Md Ali Mittelmann, Hans D. Ragi, Shankarachary South Dakota Sch Mines & Technol Elect Engn Rapid City SD 57701 USA Arizona State Univ Sch Math & Stat Sci Tempe AZ 85287 USA

In this paper, we present a decentralized unmanned aerial vehicle (UAV) swarm formation control approach based on a decision theoretic approach. Specifically, we pose the UAV swarm motion control problem as a decentralized Markov decision process (Dec-MDP). Here, the goal is to drive the UAV swarm from an initial geographical region to another geographical region where the swarm must form a three-dimensional shape (e.g., surface of a sphere). As most decision-theoretic formulations suffer from the curse of dimensionality, we adapt an existing fast approximate dynamic programming method called nominal belief-state optimization (NBO) to approximately solve the formation control problem. We perform numerical studies in MATLAB to validate the performance of the above control algorithms.

关键词： swarm intelligence formation control decentralized Markov decision process approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：