Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q...
详细信息
ISBN:
(数字)9798350316339
ISBN:
(纸本)9798350316346
Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these methods, the Q-function is chosen to be linear in the parameters and quadratic in selected basis functions in the state and control deviations from a base policy. A cost penalizing the $\ell_{1}$-norm of Bellman errors is minimized. We propose two methods: Linear Matrix Inequality Q-Learning (LMI-QL) and its iterative variant (LMIQLi), which solve the resulting episodic optimization problem through convex optimization. LMI-QL relies on a convex relaxation that yields a semidefinite programming (SDP) problem with linear matrix inequalities (LMIs). LMI-QLi entails solving sequential iterations of an SDP problem. Both methods combine convex optimization with direct Q-function learning, significantly improving learning speed. A numerical case study demonstrates their advantages over existing parametrized Q-learning methods.
In this paper, we revisit the strong duality of the quadratically constrained quadratic programming(QCQP) problem. We first generalize a known result for the rank-one decomposition of matrices and then apply it to con...
详细信息
In this paper, we revisit the strong duality of the quadratically constrained quadratic programming(QCQP) problem. We first generalize a known result for the rank-one decomposition of matrices and then apply it to consider the strong duality for more general QCQP scenarios, including the cases with one constraint, two constraints while at least one being inactive on the optimal solution point, multiple constraints, and an interval constraint. A sufficient condition ensuring the strong duality of more general QCQP problems is studied as well. We also extend our results to the QCQP problems with complex variables.
This paper investigates distributionally robust minimum variance beamforming under first-order moment uncertainty. In contrast to deterministic modeling of the array response, our approach employs a distributional set...
详细信息
ISBN:
(纸本)9781467369985
This paper investigates distributionally robust minimum variance beamforming under first-order moment uncertainty. In contrast to deterministic modeling of the array response, our approach employs a distributional set to describe the uncertainty. The distributional set we introduce consists of two constraints: the probability measure constraint and a first-order moment constraint. The weights are selected to minimize the combined output power, subject to the modified distortionless response constraint that the expected real part of the array gain exceeds unity for all distributions in the uncertainty set. We begin our discussion by revealing the intrinsic connection between the distributionally robust minimum variance beamformers (DRMVB) and the robust minimum variance beamformer (RMVB). Then for the sample space described by a union of ellipsoids, the DRMVB is reformulated as the optimal solution of a semidefinite program (SDP). Finally, we demonstrate the performance of the DRMVB via several numerical examples.
This work validates through flight tests a previously developed wide-envelope singularity-free aerodynamic framework, called ϕ-theory, for modeling dual-engine tail-sitting flying-wing vehicles for optimization-based ...
详细信息
ISBN:
(数字)9798350384574
ISBN:
(纸本)9798350384581
This work validates through flight tests a previously developed wide-envelope singularity-free aerodynamic framework, called ϕ-theory, for modeling dual-engine tail-sitting flying-wing vehicles for optimization-based control. The ϕ-theory methodology imposes a specific geometry on aerodynamic coefficients that leads to polynomial differential equations of motion amenable to semidefinite programming optimization. Through ϕ-theory, we illustrate a typical predicted longitudinal and lateral flight envelope of a tail-sitting vehicle, which, while commonplace for fixed-wing aircraft in performance textbooks, is a novel figure that generalizes fixed-wing doghouse plots to tail-sitting vehicles. This flight envelope figure suggests a novel, natural and intuitive remote piloting interface that we validate in flight tests. Furthermore, we further validate ϕ-theory through the computation of flight features in simulation and their subsequent observation in flight tests.
The Optimal Power Flow (OPF) problem is nonconvex and, for generic network structures, is NP-hard. A recent flurry of work has explored the use of semidefinite relaxations to solve the OPF problem. For general network...
详细信息
ISBN:
(纸本)9781479934119
The Optimal Power Flow (OPF) problem is nonconvex and, for generic network structures, is NP-hard. A recent flurry of work has explored the use of semidefinite relaxations to solve the OPF problem. For general network structures, however, this approach may fail to yield solutions that are physically meaningful, in the sense that they are high rank – precluding their efficient mapping back to the original feasible set. In certain cases, however, there may exist a hidden rank-one optimal solution. In this paper an iterative linearization-minimization algorithm is proposed to uncover rank-one solutions for the relaxation. The iterates are shown to converge to a stationary point. A simple bisection method is also proposed to address problems for which the linearizationminimization procedure fails to yield a rank-one optimal solution. The algorithms are tested on representative power system examples. In many cases, the linearization-minimization procedure obtains a rank-one optimal solution where the naive semidefinite relaxation fails. Furthermore, a 14-bus example is provided for which the linearization-minimization algorithm achieves a rank-one solution with a cost strictly lower than that obtained by a conventional solver. We close by discussing some rank monotonicity properties of the proposed methodology.
In this paper, a joint transmit and receive beampattern design (JTRBD) strategy is developed for multi-target tracking (MT) in colocated multiple-input multiple-output (CMIMO) radar system under hostile environment. T...
详细信息
ISBN:
(数字)9798350363951
ISBN:
(纸本)9798350363968
In this paper, a joint transmit and receive beampattern design (JTRBD) strategy is developed for multi-target tracking (MT) in colocated multiple-input multiple-output (CMIMO) radar system under hostile environment. The key mechanism of the JTRBD strategy can be divided into two successive iteration optimization stages. First, the C-MIMO radar optimizes the transmit waveform correlation matrix (WCM) to form a low peak sidelobe level (PSL) and resource-awarebased beampattern, thereby minimizing the MT error and the probability of interception. Second, the radar system designs the weights of the spatial filters and employs digital beamforming techniques to generate corresponding receive beampatterns, so as to reduce the gain in the direction of oppressive interference. Due to the coupling of the adaptable parameters in both the objective function and constraints, the resultant problem is nonconvex and NP-hard, and therefore, an iterative solution scheme based on semidefinite programming (SDP) algorithm and the nonmonotone spectral projected gradient (NSPG) method is devised. Simulation results illustrate the effectiveness of the JTRBD strategy.
A novel gridless sparse method(GSM) is proposed to estimate two-dimensional(2-D) direction-of-arrival(DOA)using L-shaped *** angular space is transformed to a frequency one and a new model is constructed in the freque...
详细信息
A novel gridless sparse method(GSM) is proposed to estimate two-dimensional(2-D) direction-of-arrival(DOA)using L-shaped *** angular space is transformed to a frequency one and a new model is constructed in the frequency *** on the new model,the covariance matrix is reparameterized by a positive semidefinite Toeplitz *** fitting criterion and semidefinite programming are used to estimate DOAs in the continuous *** with traditional 2-D DOA estimation methods,there is no need to discretize the whole angular space which can cause modeling error and increase computation ***,the proposed method can get pair-matching automatically.
This paper considers the problem of reducing the computational complexity associated with the Sum-of-Squares approach to stability analysis of time-delay systems. Specifically, this paper considers systems with a larg...
详细信息
ISBN:
(纸本)9781424474264
This paper considers the problem of reducing the computational complexity associated with the Sum-of-Squares approach to stability analysis of time-delay systems. Specifically, this paper considers systems with a large state-space but with relatively few delays-- the most common situation in practice. The paper uses the general framework of coupled differential-difference equations with delays in low-dimensional feedback channels. This framework includes both the standard delayed and neutral-type systems. The approach is based on recent results which introduced a new type of Lyapunov-Krasovskii form which was shown to be necessary and sufficient for stability of this class of systems. This paper shows how exploiting the structure of the new functional can yield dramatic improvements in computational complexity. Numerical examples are given to illustrate this improvement.
semidefinite programming(SDP) is one of the strongest algorithmic techniques used in the design of approximation algorithms. In recent years, Unique Games Conjecture(UGC) has proved to be intimately connected to the l...
详细信息
ISBN:
(纸本)9781605580470
semidefinite programming(SDP) is one of the strongest algorithmic techniques used in the design of approximation algorithms. In recent years, Unique Games Conjecture(UGC) has proved to be intimately connected to the limitations of semidefinite *** this connection precise, we show the following result : If UGC is true, then for every constraint satisfaction problem(CSP) the best approximation ratio is given by a certain simple SDP. Specifically, we show a generic conversion from SDP integrality gaps to UGC hardness results for every CSP. This result holds both for maximization and minimization problems over arbitrary finite *** this connection between integrality gaps and hardness results we obtain a generic polynomial-time algorithm for all CSPs. Assuming the Unique Games Conjecture, this algorithm achieves the optimal approximation ratio for every ***, for all 2-CSPs the algorithm achieves an approximation ratio equal to the integrality gap of a natural SDP used in literature. Further the algorithm achieves at least as good an approximation ratio as the best known algorithms for several problems like MaxCut, Max2Sat, MaxDiCut and Unique Games.
In recent years,the scheme based on Received Signal Strength(RSS) has attracted wide attention in sensor nodes positioning due to its advantage of low cost and lack of *** this paper,a positioning model is built based...
详细信息
In recent years,the scheme based on Received Signal Strength(RSS) has attracted wide attention in sensor nodes positioning due to its advantage of low cost and lack of *** this paper,a positioning model is built based on RSS for underwater wireless sensor networks,and the Least Square Relative Error(LSRE) estimation method is adopted to solve the problem of semidefinite programming of prior constraints when the transmission power is ***,the formula of underwater acoustic path loss is approximate to pseudo linear multiplication model by mathematical *** a nonconvex LSRE problem with the node position and transmission power as variables is established with this ***,a matrix containing compound variables is constructed by using the ascending dimension relaxation technique of semidefinite *** on the external characteristics of compound variables,prior constraints are added to solve the convex optimization *** results show that this algorithm has higher estimation accuracy than that based on absolute error estimation.
暂无评论