Hybrid control problems are complicated by the need to make a suitable sequence of discrete decisions related to future modes of operation of the system. Model predictive control (MPC) encodes a finite-horizon truncat...
详细信息
Hybrid control problems are complicated by the need to make a suitable sequence of discrete decisions related to future modes of operation of the system. Model predictive control (MPC) encodes a finite-horizon truncation of such problems as a mixed-integer program, and then imposes a cost and/or constraints on the terminal state intended to reflect all post-horizon behaviour. However, these are often ad hoc choices tuned by hand after empirically observing performance. We present a learning method that sidesteps this problem, in which the so-called N-step Q-function of the problem is approximated from below, based on experience evaluating the policy. The function takes a state and a sequence of N control decisions as arguments, and therefore extends the traditional notion of a Q-function from reinforcement learning. After learning it from a training process exploring the state-input space, we use it in place of the usual MPC objective. We take an example hybrid control task and show that it can be completed successfully with a shorter planning horizon than conventional hybrid MPC thanks to our proposed method. Furthermore, we report that Q-functions trained with long horizons can be truncated to a shorter horizon for online use, yielding simpler control laws with apparently little loss of performance.
dynamicprogramming models are used to analyze lambda-policy iteration with randomization algorithms. Particularly, contractive models with infinite policies are considered and it is shown that well-posedness of the l...
详细信息
dynamicprogramming models are used to analyze lambda-policy iteration with randomization algorithms. Particularly, contractive models with infinite policies are considered and it is shown that well-posedness of the lambda-operator plays a central role in the algorithm. The operator is known to be well-posed for problems with finite states, but our analysis shows that it is also well-defined for the contractive models with infinite states studied. Similarly, the algorithm we analyze is known to converge for problems with finite policies, but we identify the conditions required to guarantee convergence with probability one when the policy space is infinite regardless of the number of states. Guided by the analysis, we exemplify a data-driven approximated implementation of the algorithm for estimation of optimal costs of constrained linear and nonlinear control problems. Numerical results indicate potentials of this method in practice.
This thesis explores tactical and operational planning problems in the context of the Less-than-Truckload (LTL) industry. LTL carriers transport shipments that occupy a small fraction of trailer capacity, and, thus, r...
详细信息
This thesis explores tactical and operational planning problems in the context of the Less-than-Truckload (LTL) industry. LTL carriers transport shipments that occupy a small fraction of trailer capacity, and, thus, rely on the consolidation of freight from multiple shippers to achieve economies of scale. The first part of this thesis focuses on tactical planning operations of LTL carriers. In particular, in Chapter 2, we study the service network design problem confronted by LTL carriers ahead of an operating season. This problem includes determining: (1) the number of services (trailers) to operate between each pair of terminals, and (2) a load plan which specifies the sequence of transfer terminals that freight with a given origin and destination will visit. Traditionally, for every terminal and every ultimate destination, a load plan specifies a unique next terminal. We introduce the p-alt model, which generalizes traditional load plans by allowing decision-makers to specify a desired number of next terminal options for terminal-destination pairs using a vector p. We compare a number of exact and heuristic approaches for solving a two-stage stochastic variant of the p-alt model. Using this model, we show that by explicitly considering demand uncertainty and by merely allowing up to two next terminal options for terminal-destination pairs in the load plans, carriers can generate substantial cost savings; cost savings that are comparable to those yielded by adopting load plans that allow for any next terminal to be a routing option for terminal-destination pairs. Moreover, by using these more flexible load plans, carriers can generate cost savings in the order of 10% over traditional load plan designs obtained by deterministic models. The second part of the thesis shifts to an operational setting relating to how freight is routed through the carrier's service network. As the daily freight quantities handled by a carrier are uncertain, freight routes are dynamica
We consider the problem of discounted optimal state-feedback regulation for general unknown deterministic discrete-time systems. It is well known that open-loop instability of systems, non-quadratic cost functions and...
详细信息
We consider the problem of discounted optimal state-feedback regulation for general unknown deterministic discrete-time systems. It is well known that open-loop instability of systems, non-quadratic cost functions and complex nonlinear dynamics, as well as the on-policy behavior of many reinforcement learning (RL) algorithms, make the design of model-free optimal adaptive controllers a challenging task. We depart from commonly used least-squares and neural network approximation methods in conventional model-free control theory, and propose a novel family of data-driven optimization algorithms based on linear programming, off-policy Q-learning and randomized experience replay. We develop both policy iteration (PI) and value iteration (VI) methods to compute an approximate optimal feedback controller with high precision and without the knowledge of a system model and stage cost function. Simulation studies confirm the effectiveness of the proposed methods. Copyright (C) 2020 The Authors.
This paper presents a logistics serious game that describes an anticipatory planning problem for the dispatching of trucks, barges, and trains, considering uncertainty in future container arrivals. The problem setting...
详细信息
ISBN:
(纸本)9783030597474;9783030597467
This paper presents a logistics serious game that describes an anticipatory planning problem for the dispatching of trucks, barges, and trains, considering uncertainty in future container arrivals. The problem setting is conceptually easy to grasp, yet difficult to solve optimally. For this problem, we deploy a variety of benchmark algorithms, including two heuristics and two reinforcement learning implementations. We use the serious game to compare the manual performance of human decision makers with those algorithms. Furthermore, the game allows humans to create their own automated planning rules, which can also be compared with the implemented algorithms and manual game play. To illustrate the potential use of the game, we report the results of three gaming sessions: with students, with job seekers, and with logistics professionals. The experimental results show that reinforcement learning typically outperforms the human decision makers, but that the top tier of humans come very close to this algorithmic performance.
We propose a new approach to optimize operations of hydro storage systems with multiple connected reservoirs whose operators participate in wholesale electricity markets. Our formulation integrates short-term intraday...
详细信息
We propose a new approach to optimize operations of hydro storage systems with multiple connected reservoirs whose operators participate in wholesale electricity markets. Our formulation integrates short-term intraday with long-term interday decisions. The intraday problem considers bidding decisions as well as storage operation during the day and is formulated as a stochastic program. The interday problem is modeled as a Markov decision process of managing storage operation over time, for which we propose integrating stochastic dual dynamicprogramming with approximate dynamic programming. We show that the approximate solution converges toward an upper bound of the optimal solution. To demonstrate the efficiency of the solution approach, we fit an econometric model to actual price and inflow data and apply the approach to a case study of an existing hydro storage system. Our results indicate that the approach is tractable for a real-world application and that the gap between theoretical upper and a simulated lower bound decreases sufficiently fast.
We address the operational management of station-based bike sharing systems (BSSs). In BSSs, users can spontaneously rent and return bikes at any stations in the system. Demand is driven by commuter, shopping, an...
详细信息
We studied several important management and policy analysis problems in food supply chain systems utilizing large-scale optimization, stochastic resource allocation, and data-analytics methodologies. We focused on thr...
详细信息
We studied several important management and policy analysis problems in food supply chain systems utilizing large-scale optimization, stochastic resource allocation, and data-analytics methodologies. We focused on three main research questions: 1) How can retailers build green, efficient last-mile logistics system when the objective is to maximize their profit and minimize the costs due to fuel consumption, inventory holding, and greenhouse gas emissions (Chapter 2); 2) what is the best environmental intervention policy to reduce the environmental externalities associated with the production of fruits and vegetables considering environmental and economic dimensions simultaneously (Chapter 3); and (3) How can food banks better manage food supplies distribution to combat food insecurity of underserved population (Chapters 4 & 5). Specifically, we have explored the following four dimensions in food supply chains 1) Benders decomposition for the inventory vehicle routing problem with perishable products and environmental costs. We consider the problem of inventory routing in the context of perishable products and find near-optimal replenishment scheduling and vehicle routes. To solve the problem efficiently, we develop an exact method based on Benders decomposition to find high-quality solutions in reasonable time and a two-stage meta-heuristic. 2) A systems approach to carbon policy for fruit supply chains: carbon tax, technology innovation, or land sparing? Reducing carbon emissions of food supply chains has increasingly received attention from businesses and policymakers. In order to propose sound policies aimed at lowering such emissions, policy makers favor tools that are informative in the economic and environmental dimensions simultaneously. In this study we offer a systems-based approach which is intended to do just that by developing a spatially and temporally disaggregated price equilibrium mathematical model for a food production and distribution system and a
Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization contro...
详细信息
Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex program with respect to its parameters. A longer version of this paper, which illustrates our method on many examples, is available at https://***/similar to boyd/papers/learning_***.
In this paper, we present a decentralized unmanned aerial vehicle (UAV) swarm formation control approach based on a decision theoretic approach. Specifically, we pose the UAV swarm motion control problem as a decentra...
详细信息
In this paper, we present a decentralized unmanned aerial vehicle (UAV) swarm formation control approach based on a decision theoretic approach. Specifically, we pose the UAV swarm motion control problem as a decentralized Markov decision process (Dec-MDP). Here, the goal is to drive the UAV swarm from an initial geographical region to another geographical region where the swarm must form a three-dimensional shape (e.g., surface of a sphere). As most decision-theoretic formulations suffer from the curse of dimensionality, we adapt an existing fast approximate dynamic programming method called nominal belief-state optimization (NBO) to approximately solve the formation control problem. We perform numerical studies in MATLAB to validate the performance of the above control algorithms.
暂无评论