This paper aims to present improvable and computable approximation approaches for solving the two-stage robust optimization problem, which arises from various applications such as optimal energy management and product...
详细信息
This paper aims to present improvable and computable approximation approaches for solving the two-stage robust optimization problem, which arises from various applications such as optimal energy management and production planning. Based on sampling finite number scenarios of uncertainty, we can obtain a lower bound approximation and show that the corresponding solution is at least -level feasible. Moreover, piecewise linear decision rules (PLDRs) are also introduced to improve the upper bound that obtained by the widely-used linear decision rule. Furthermore, we show that both the lower bound and upper bound approximation problems can be reformulated into solvable saddle point problems and consequently be solved by the mirrordescent method.
This paper uses the mirror descent algorithm with periodic dynamic quantization to solve constrained distributed optimization problems with limited communication channels. Due to the imperfect network environment, obt...
详细信息
This paper uses the mirror descent algorithm with periodic dynamic quantization to solve constrained distributed optimization problems with limited communication channels. Due to the imperfect network environment, obtaining accurate information is impractical, and thus a communication scheme under quantization needs to be considered. A periodic dynamic quantizer with finite quantization levels is proposed in this paper to achieve exact optimization. Moreover, a time -varying control parameter in the mirror descent algorithm is designed to control the quantization error. After a comprehensive analysis, the proposed algorithm can obtain an optimal value, and the optimal convergence rate is O(1/T0.25). (c) 2023 Elsevier Ltd. All rights reserved.
In this paper we consider optimization problems where the objective function is given in a form of the expectation. A basic difficulty of solving such stochastic optimization problems is that the involved multidimensi...
详细信息
In this paper we consider optimization problems where the objective function is given in a form of the expectation. A basic difficulty of solving such stochastic optimization problems is that the involved multidimensional integrals (expectations) cannot be computed with high accuracy. The aim of this paper is to compare two computational approaches based on Monte Carlo sampling techniques, namely, the stochastic approximation (SA) and the sample average approximation (SAA) methods. Both approaches, the SA and SAA methods, have a long history. Current opinion is that the SAA method can efficiently use a specific (say, linear) structure of the considered problem, while the SA approach is a crude subgradient method, which often performs poorly in practice. We intend to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems. We extend the analysis to the case of convex-concave stochastic saddle point problems and present (in our opinion highly encouraging) results of numerical experiments.
This paper is concerned with the constrained distributed multi-agent convex optimization problem over a timevarying *** assume that the bit rate of the considered communication is limited,such that a uniform quantizer...
详细信息
This paper is concerned with the constrained distributed multi-agent convex optimization problem over a timevarying *** assume that the bit rate of the considered communication is limited,such that a uniform quantizer is applied in the process of exchanging information over the multi-agent *** a quantizer-based distributed mirrordescent(QDMD) algorithm,which utilizes the Bregman divergence as the distance-measuring function,is developed for such optimization *** convergence result of the developed algorithm is also *** choosing the iteration step-size ■ and quantization interval v_t/t=λ/t with a prescribed parameter A,it is shown that the convergence rate of the QDMD algorithm can achieve ■,where T is the number of iterations.
Stochastic optimization is a fundamental problem that finds applications in many areas including biological and cognitive sciences. The classical stochastic approximation algorithm for iterative stochastic optimizatio...
详细信息
ISBN:
(纸本)9781510601123
Stochastic optimization is a fundamental problem that finds applications in many areas including biological and cognitive sciences. The classical stochastic approximation algorithm for iterative stochastic optimization requires gradient information of the sample object function that is typically difficult to obtain in practice. Recently there has been renewed interests in derivative free approaches to stochastic optimization. In this paper, we examine the rates of convergence for the Kiefer-Wolfowitz algorithm and the mirror descent algorithm, under various updating schemes using finite differences as gradient approximations. The analysis is carried out under a general framework covering a wide range of updating scenarios. It is shown that the convergence of these algorithms can be accelerated by controlling the implementation of the finite differences.
In reinforcement learning, an agent in an environment improves the skill depending on a reward, which is the feedback from an environment. For practical, reinforcement learning has several important challenges. First,...
详细信息
ISBN:
(纸本)9783031222153;9783031222160
In reinforcement learning, an agent in an environment improves the skill depending on a reward, which is the feedback from an environment. For practical, reinforcement learning has several important challenges. First, reinforcement learning algorithms often use assumptions for an environment such as Markov decision processes;however, the environment in the real world often cannot be represented by these assumptions. Especially we focus on the environment with nonMarkovian rewards, which allows the reward to depend on past experiences. To handle non-Markovian rewards, researchers have used a reward machine, which decomposes the original task into the sub-tasks. In those works, they assume that the sub-tasks are usually represented by a Markov decision process. Second, safety is also one of the challenges in reinforcement learning. G-CoMDS is a safe reinforcement learning algorithm based on Comirroralgorithm, an algorithm for constrained optimization problems. We have developed G-CoMDS algorithm to learn safely under environments without a Markov decision process. Therefore, the promising approach in complex situations would be decomposing the original task as the reward machine does, then solving the sub-tasks with G-CoMDS. In this paper, we provide additional experimental results and discussions of G-CoMDS, as a preliminary step of combining G-CoMDS with a reward machine. We evaluate G-CoMDS and existing reinforcement learning algorithm in the mobile robot simulation with a kind of non-Markovian rewards. The experimental result shows that G-CoMDS has the effect of suppressing the cost spike and slightly exceeds the performance of the existing safe reinforcement learning algorithm.
This paper is concerned with an online distributed convex-constrained optimization prob-lem over a multi-agent network, where the limited network bandwidth and potential feed-back delay caused by network communication...
详细信息
This paper is concerned with an online distributed convex-constrained optimization prob-lem over a multi-agent network, where the limited network bandwidth and potential feed-back delay caused by network communication are considered. To cope with the limited network bandwidth, an event-triggered communication scheme is introduced in informa-tion exchange. Then, based on the delayed (i.e., single-point and two-point) bandit feed-back, two event-triggered distributed online convex optimization algorithms are developed by utilizing the Bregman divergence in the projection step. Meanwhile, the convergence of the two developed algorithms is analyzed according to the provided static regret bounds achieved by the algorithm. The obtained results show that a sublinear static regret with respect to the time horizon T can be ensured if the triggering threshold gradually ap-proaches zero. In this case, the corresponding order of the regret bounds is also deter-mined by choosing suitable triggering thresholds. Finally, a distributed online regularized linear regression problem is provided as an example to illustrate the effectiveness of the proposed two algorithms. (c) 2023 Elsevier Inc. All rights reserved.
暂无评论