The paradigm of inverse reinforcement learning (IRL) is used to specify the reward function of an agent purely from its actions and is critical for value alignment and AI safety. While IRL is successful in practice, t...
The paradigm of inverse reinforcement learning (IRL) is used to specify the reward function of an agent purely from its actions and is critical for value alignment and AI safety. While IRL is successful in practice, theoretical guarantees remain nascent. Motivated by the need for IRL in large action spaces with limited data, we consider as a first step the problem of learning from a single sequence of actions (i.e., a demonstration) of a stochastic linear bandit algorithm. When the demonstrator employs the Phased Elimination algorithm, we develop a simple inverse learning procedure that estimates the linear reward function consistently in the time horizon with just a single demonstration. In particular, we show that our inverse learner approximates the true reward parameter within a error of $\mathcal{O}(T^{-\frac{\omega - 1}{2\omega }})$ (where T is the length of the demonstrator's trajectory and ω is a constant that depends on the geometry of the action set). We complement this result with an information-theoretic lower bound for any inverse learning procedure. We corroborate our theoretical results with simulations on synthetic data and a demonstration constructed from the MovieLens dataset.
Spatial modulation (SM) is a low-complexity multiple-input/multiple-output transmission technique that combines index modulation and quadrature amplitude modulation for wireless communications. In this work, we consid...
详细信息
In modern healthcare, cloud-based e-health technology offers substantial benefits but faces significant security challenges. Sensitive patient data is vulnerable to cyber threats during transmission and storage, poten...
详细信息
This paper presents a novel approach for the online calculation of Linear Quadratic Regulator (LQR) gains using the Tabular Dyna-Q algorithm. By leveraging Q-learning, this technique enables the determination of gains...
详细信息
The task of the energy management system is to create conditions for maximizing the efficiency of electricity and heat consumption in RTU buildings, while ensuring a comfortable indoor climate and enabling continuous ...
Opinion has always affected businesses and individuals especially from the Public. People react through social media and spread it incompletely. The situation was then accepted as public opinion. There are three categ...
详细信息
The research project focuses on prototyping an IoT (Internet of Things) system for measuring and monitoring the quality of the Wang River in Lampang Municipality. The system utilizes EC (electrical Conductivity), pH, ...
详细信息
Sample adaptive offset (SAO) is applied for reducing sample distortion and attenuating ringing artifacts in both HEVC and VVC standards. The rate-distortion optimization process is used to select the best SAO paramete...
详细信息
The report presents a hands-on learning approach that can be implemented in the computer Architectures labs. A model of a pipelined microarchitecture RISC-V processor core developed using the high-level hardware descr...
详细信息
The effectiveness of behavior change support systems (BCSS) in promoting health and well-being is unflinching. However, its long-term effectiveness is hindered by non-compliance. Research in BCSS that focuses on compl...
详细信息
暂无评论