Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization(BO) modeling approach, along with a host of efficient BO algorithms, has been p...
详细信息
Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization(BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic(AC) policy optimization problems. In this work, based on a bilevelstructured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions,the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L20 L20?-1). The convergence performance of the proposed algorithm is verified through numerical simulations.
This article focuses on distributed nonconvex optimization by exchanging information between agents to minimize the average of local nonconvex cost functions. The communication channel between agents is normally const...
详细信息
This article focuses on distributed nonconvex optimization by exchanging information between agents to minimize the average of local nonconvex cost functions. The communication channel between agents is normally constrained by limited bandwidth, and the gradient information is typically unavailable. To overcome these limitations, we propose a quantized distributed zeroth-order algorithm, which integrates the deterministic gradient estimator, the standard uniform quantizer, and the distributed gradient tracking algorithm. We establish linear convergence to a global optimal point for the proposed algorithm by assuming Polyak-Lojasiewicz condition for the global cost function and smoothness condition for the local cost functions. Moreover, the proposed algorithm maintains linear convergence at low-data rates with a proper selection of algorithm parameters. Numerical simulations validate the theoretical results.
This article considers the distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of local cost functions by using local information exchange. We first consider a distributed f...
详细信息
This article considers the distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of local cost functions by using local information exchange. We first consider a distributed first-order primal-dual algorithm. We show that it converges sublinearly to a stationary point if each local cost function is smooth and linearly to a global optimum under an additional condition that the global cost function satisfies the Polyak-Lojasiewicz condition. This condition is weaker than strong convexity, which is a standard condition for proving linear convergence of distributed optimization algorithms, and the global minimizer is not necessarily unique. Motivated by the situations where the gradients are unavailable, we then propose a distributed zeroth-order algorithm, derived from the considered first-orderalgorithm by using a deterministic gradient estimator, and show that it has the same convergence properties as the considered first-orderalgorithm under the same conditions. The theoretical results are illustrated by numerical simulations.
In this paper, we study zeroth -orderalgorithms for nonconvex-concave minimax problems, which have attracted much attention in machine learning, signal processing, and many other fields in recent years. We propose a ...
详细信息
In this paper, we study zeroth -orderalgorithms for nonconvex-concave minimax problems, which have attracted much attention in machine learning, signal processing, and many other fields in recent years. We propose a zeroth -order alternating randomized gradient projection (ZO-AGP) algorithm for smooth nonconvex-concave minimax problems;its iteration complexity to obtain an \varepsilon -stationary point is bounded by O( \varepsilon - 4 ), and the number of function value estimates is bounded by O( d x + d y ) per iteration. Moreover, we propose a zeroth -order block alternating randomized proximal gradient algorithm (ZO-BAPG) for solving blockwise nonsmooth nonconvexconcave minimax optimization problems;its iteration complexity to obtain an \varepsilon -stationary point is bounded by O( \varepsilon - 4 ), and the number of function value estimates per iteration is bounded by O( Kd x + d y ). To the best of our knowledge, this is the first time zeroth -orderalgorithms with iteration complexity guarantee are developed for solving both general smooth and blockwise nonsmooth nonconvex-concave minimax problems. Numerical results on the data poisoning attack problem and the distributed nonconvex sparse principal component analysis problem validate the efficiency of the proposed algorithms.
Nonconvex minimax problems have attracted significant interest in machine learning and many other fields in recent years. In this paper, we propose a new zeroth-order alternating randomized gradient projection algorit...
详细信息
Nonconvex minimax problems have attracted significant interest in machine learning and many other fields in recent years. In this paper, we propose a new zeroth-order alternating randomized gradient projection algorithm to solve smooth nonconvex-linear problems and its iteration complexity to find an e-first-order Nash equilibrium is O(epsilon(-3)) and the number of function value estimation per iteration is bounded by O(d(x)epsilon(-2)). Furthermore, we propose a zeroth-order alternating randomized proximal gradient algorithm for block-wise nonsmooth nonconvex-linear minimax problems and its corresponding iteration complexity is O(K-3/2 epsilon(-3)) and the number of function value estimation is bounded by O(d(x) epsilon(-2))per iteration. The numerical results indicate the efficiency of the proposed algorithms.
暂无评论