For many industrial processes it is important to model the cure kinetics of phenol-formaldehyde resoles. Yet the applicability of common model-free kinetic algorithms for the cure of phenolic resins is not known. In t...
详细信息
For many industrial processes it is important to model the cure kinetics of phenol-formaldehyde resoles. Yet the applicability of common model-free kinetic algorithms for the cure of phenolic resins is not known. In this study the ability of the Friedman, Vyazovkin and Kissinger-Akahira-Sunose (KAS) model-free-kinetics algorithms to model and predict the cure kinetics of commercial resoles is compared. The Friedman and Vyazovkin methods generate consistent activation energy dependences on conversion compared to the KAS method. In addition, the activation energy dependency on conversion is of higher amplitude with these two methods than with the KAS method. Hence, the Friedman and Vyazovkin methods are more adequate for revealing the cure steps of commercial PF resoles. Conversely, the KAS algorithm is easily amenable to dynamic cure predictions compared to the Friedman and Vyazovkin methods. Isothermal cure is equally well predicted with the three. As a result, the KAS algorithm is the method of choice for modeling and predicting the cure kinetics of commercial phenolic resoles under various temperature programs. (C) 2005 Elsevier B.V. All rights reserved.
We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic Q-Learning (VOQL), based on Q-learning ...
详细信息
We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic Q-Learning (VOQL), based on Q-learning and bound its regret assuming closure under Bellman backups, and bounded Eluder dimension for the regression function class. As a special case, VOQL achieves (O) over tilde (d root TH + d(6)H(5)) regret over T episodes for a horizon H MDP under (d-dimensional) linear function approximation, which is asymptotically optimal. Our algorithm incorporates weighted regression-based upper and lower bounds on the optimal value function to obtain this improved regret. The algorithm is computationally efficient given a regression oracle over the function class, making this the first computationally tractable and statistically optimal approach for linear MDPs.
This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent advances in offline reinforcement learning, we devel...
详细信息
This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates the principle of pessimism into asynchronous Q-learning, which penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). This framework leads to, among other things, improved sample efficiency and enhanced adaptivity in the presence of near-expert data. Our approach permits the observed data in some important scenarios to cover only partial state-action space, which is in stark contrast to prior theory that requires uniform coverage of all state-action pairs. When coupled with the idea of variance reduction, asynchronous Q-learning with LCB penalization achieves near-optimal sample complexity, provided that the target accuracy level is small enough. In comparison, prior works were suboptimal in terms of the dependency on the effective horizon even when i.i.d. sampling is permitted. Our results deliver the first theoretical support for the use of pessimism principle in the presence of Markovian non-i.i.d. data.
The current paper combines the main features of first-order Active Disturbance Rejection Control (ADRC) with Virtual Reference Feedback Tuning (VRFT) to automatically determine the parameters of the controller without...
详细信息
ISBN:
(纸本)9781728198095
The current paper combines the main features of first-order Active Disturbance Rejection Control (ADRC) with Virtual Reference Feedback Tuning (VRFT) to automatically determine the parameters of the controller without the process model. The development of the resulted data-driven algorithm, called first-order ADRC-VRFT algorithm, is exemplified in the control of cart position, arm angular position and payload position of three-degree-of-freedom tower crane systems (TCSs) using three Single Input-Single Output (SISO) loops that are running in parallel. The three SISO first-order ADRC-VRFT algorithms benefit from a twofold validation using experiments on real-time TCS equipment: model-free (without the process model) and also model-based (making use of the process model). The algorithms are also compared in terms of the experimental results by considering an objective function as performance index.
This paper addresses the development of general purpose game agents able to learn a vast number of games using the same architecture. The article analyzes the main existing approaches to general game playing, reviews ...
详细信息
ISBN:
(纸本)9781509004614
This paper addresses the development of general purpose game agents able to learn a vast number of games using the same architecture. The article analyzes the main existing approaches to general game playing, reviews their performance and proposes future research directions. Methods such as deep learning, reinforcement learning and evolutionary algorithms are considered for this problem. The testing platform is the popular video game console Atari 2600. Research into developing general purpose agents for games is closely related to achieving artificial general intelligence (AGI).
This article examines the problem of developing a simple, model-free algorithm for detecting and identifying the time instant when a Lithium-Sulfur (Li-S) cell passes through its "dip point"during discharge....
详细信息
This article examines the problem of developing a simple, model-free algorithm for detecting and identifying the time instant when a Lithium-Sulfur (Li-S) cell passes through its "dip point"during discharge. The dip point marks a sharp transition between two different sets of redox reactions involved in Li-S battery discharge, and is characterized by a significant change in the slope of battery potential with respect to charge processed. This makes it possible to detect the dip point accurately using a simple algorithm that uses a moving-horizon least-squares method to estimate the above slope, then detects the dip point by detecting changes in this slope. We validate this algorithm both using a physics-based battery simulation and experimentally, using custom-fabricated Li-S coin cells. One potential benefit of this algorithm is the degree to which it makes it possible to pinpoint battery cell arrival into the low plateau region accurately, which is important in light of the well-recognized difficulties associated with Li-S battery state estimation in this region. This opens the door to potential innovations in Li-S battery pack balancing that rely on dip point detection as a simpler and reliable alternative to full battery state estimation.
暂无评论