This paper introduces an algorithm for direct search of control policies in continuous-state discrete-actionMarkov decision processes. The algorithm looks for the best closed-loop policy that can be represented using ...
详细信息
This paper introduces an algorithm for direct search of control policies in continuous-state discrete-actionMarkov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basisfunctions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for cross-entropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT.
In this paper, we present the application of specially constructed adaptive basis functions that generates a diagonal matrix in the method of moments solution procedure for the calculation of scattered electromagnetic...
详细信息
In this paper, we present the application of specially constructed adaptive basis functions that generates a diagonal matrix in the method of moments solution procedure for the calculation of scattered electromagnetic fields from arbitrarily shaped conducting bodies excited by a plane electromagnetic wave. The arbitrary body is modeled using planar triangular patches. The crucial step in the solution procedure is the construction of the adaptive basis functions to generate the diagonal matrix. This task is accomplished with the help of well-known RWG basisfunctions. The solution thus obtained is very efficient, accurate, and applicable to truly arbitrary bodies. Several numerical examples are presented to validate the new method.
An efficient algorithm is proposed for the analysis of large finite arrays using the adaptive basis functions/diagonal moment matrix technique. adaptive basis functions constructed using clusters spanning over an arra...
详细信息
An efficient algorithm is proposed for the analysis of large finite arrays using the adaptive basis functions/diagonal moment matrix technique. adaptive basis functions constructed using clusters spanning over an array element are used to generate a highly diagonally dominant moment matrix. The physical interpretation of the constructed set of adaptive basis functions is discussed. The new matrix equation is solved iteratively in a way that only the significant mutual impedances are considered. The proposed algorithm is applied to a linear array of bow-tie antennas, and results are compared to those obtained using the direct moment method solution and exhibit very good agreement. The relative computational time improves as the array size increases compared to the conventional moment method solution. A speedup factor of more than 100 is achieved for an array of 32 elements.
An efficient numerical technique is proposed for determining the buckling load of two-dimensional skeletal structures. The key formulation is based upon the principle of stationary total potential energy and the solut...
详细信息
An efficient numerical technique is proposed for determining the buckling load of two-dimensional skeletal structures. The key formulation is based upon the principle of stationary total potential energy and the solution procedure follows the concept of Rayleigh-Ritz approximation. A crucial aspect of the proposed technique is to supply the adaptivity to the solution space allowing the accurate representation of the buckled shape via a simple iterative scheme. The bases of such solution space are constructed in an elementwise fashion using the exact, closed-form buckled shape. An element axial force contained in the element shape functions is chosen as an adaptive parameter and the exact buckled shape of each element is achieved when such adaptive parameter converges to the element buckling load. In this study, various effects including the lateral restraints, shear deformation, and material nonlinearity are taken into account, and this, as a result, allows plane frames with/without lateral bracings, columns resting on elastic foundations, inelastic columns, and those with shear deformation to be treated. Results from an extensive numerical study have indicated that the proposed technique yields highly accurate buckling loads, comparable to the analytical and reference solutions, without the mesh refinement. In addition, a relatively low number of iterations is required to achieve the converged buckling load.
Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice. Many algorith...
详细信息
Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice. Many algorithms for approximate RL rely on basis-function representations of the value function (or of the Q-function). Designing a good set of basisfunctions without any prior knowledge of the value function (or of the Q-function) can be a difficult task. In this paper, we propose instead a technique to optimize the shape of a constant number of basisfunctions for the approximate, fuzzy Q-iteration algorithm. In contrast to other approaches to adapt basisfunctions for RL, our optimization criterion measures the actual performance of the computed policies in the task, using simulation from a representative set of initial states. A complete algorithm, using cross-entropy optimization of triangular membership functions, is given and applied to the car-on-the-hill example.
暂无评论