This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The ...
详细信息
This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms
Phase-change Random Access Memory (PRAM) has drawn much attention as a promising candidate for the next generation nonvolatile memory. This is because PRAM has a great potential not only to provide adequate solutions ...
详细信息
ISBN:
(纸本)142440584X
Phase-change Random Access Memory (PRAM) has drawn much attention as a promising candidate for the next generation nonvolatile memory. This is because PRAM has a great potential not only to provide adequate solutions for solving the scaling issues that other conventional nonvolatile memories might face in near future, but also to create new functions and applications of its own with its fast write programming speed and direct overwrite capability. As a result, PRAM has been the fastest evolutionary memory and it is close to commercialization. In this paper, recent progresses in PRAM technologies will be discussed and future direction will be proposed.
A major impediment for more widespread use of offline partial evaluation is the difficulty of obtaining and maintaining annotations for larger, realistic programs. Existing automatic binding-time analyses still only h...
详细信息
ISBN:
(纸本)9783540714095
A major impediment for more widespread use of offline partial evaluation is the difficulty of obtaining and maintaining annotations for larger, realistic programs. Existing automatic binding-time analyses still only have limited applicability and annotations often have to be created or improved and maintained by hand, leading to errors. We present a technique to help overcome this problem by using online control techniques which supervise the specialisation process in order to detect such errors. We discuss an implementation in the LOGEN system and show on a series of examples that this approach is effective: very few false alarms were raised while infinite loops were detected quickly. We also present the integration of this technique into a web interface, which highlights problematic annotations directly in the source code. A method to automatically fix incorrect annotations is presented, allowing the approach to be also used as a pragmatic binding time analysis. Finally we show how our method can be used for efficiently locating errors with built-ins inside Prolog source code.
The ability to recalibrate a system is an important feature in allowing that system to maintain optimal performance even in the face of new demands placed on that system by environmental changes or even consumers'...
详细信息
The ability to recalibrate a system is an important feature in allowing that system to maintain optimal performance even in the face of new demands placed on that system by environmental changes or even consumers' desires. The use of floating-gate (FG) transistors provides programmability to analog circuitry and, hence, the ability to recalibrate an analog system. If the FG transistors are programmed indirectly by using a second transistor to perform hot-electron injection, then an analog system can be recalibrated and reprogrammed without ever having to take the circuit out of operation. In this paper, we present a technique for adjusting circuit properties while still in operation as well as example circuits in which this run-time programming is conducted.
Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new ...
详细信息
Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method
We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the 'kernelization' of m...
详细信息
We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the 'kernelization' of model-free LSPE(λ). The 'kernelization' is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this
Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposi...
详细信息
Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposition of the system based on that behavior, and constructing a control policy based on that decomposition. We introduce a novel method for automatically finding decompositions of a task based on observing the behavior of a preexisting controller. Unlike previous work, these decompositions define reparameterizations of the state space that can permit simplified control of the system
The goal of operating system (OS) discovery is to learn which OS is running on a distant computer. There are two main strategies for OS discovery: active and passive. Each of them has advantages as well as drawbacks. ...
详细信息
ISBN:
(纸本)1424407982
The goal of operating system (OS) discovery is to learn which OS is running on a distant computer. There are two main strategies for OS discovery: active and passive. Each of them has advantages as well as drawbacks. This paper discusses how answer set programming, a new logic programming paradigm, can be used to address, in a simple and elegant way, the problem of operating system discovery in computer networks by logically specifying the problem and providing solutions through automated reasoning. As a result of using such a knowledge representation framework, it is possible to unify the active and the passive methods to OS discovery in a single hybrid approach that has the advantages of both strategies while being much more versatile. Moreover, this paper presents a proof of concept prototype for hybrid operating system discovery.
In this paper, a novel method for designing IIR variable fractional delay (VFD) digital filters with variable and fixed denominator is presented. First of all, a peak-constrained weighted least-squares (PCWLS) method ...
详细信息
In this paper, a novel method for designing IIR variable fractional delay (VFD) digital filters with variable and fixed denominator is presented. First of all, a peak-constrained weighted least-squares (PCWLS) method is employed to design a set of FIR fixed fractional delay (FD) filters according to given specifications. The PCWLS FIR filters are implemented by the projected least-squares (PLS) algorithm. An iterative WLS model reduction technique is utilized to design denominators, which can guarantee the stability of designed IIR VFD filter if the iteration converges. The numerator of IIR fixed FD filters can be designed by two approaches: The Approach 1 solves linear equations based on the orthogonality principle; and the Approach 2 formulates the numerator design problem as a standard quadratic programming (QP) problem. The coefficients of IIR fixed FD filters are finally approximated by polynomial functions of FD. Three sets of examples are given to demonstrate the effectiveness of the proposed method.
暂无评论