The framework of feedback graphs is a generalization of sequential decisionmaking with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, followin...
ISBN:
(纸本)9781713871088
The framework of feedback graphs is a generalization of sequential decisionmaking with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order $\min\bigl\{\min_{\varepsilon} \sqrt{(\alpha_\varepsilon/\varepsilon) T},\, \min_{\varepsilon} (\delta_\varepsilon/\varepsilon)^{1/3} T^{2/3}\bigr\}$ (ignoring logarithmic factors), where αε and δε are graph-theoretic quantities measured on the support of the stochastic feedback graph G with edge probabilities thresh-olded at ε. Our result, which holds without any preliminary knowledge about G, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.
In planar pursuit-evasion differential games considering a faster pursuer and slower evader, the interception points resulting from equilibrium strategies lie on the Apollonius circle. This property is instrumental fo...
详细信息
ISBN:
(数字)9798350316339
ISBN:
(纸本)9798350316346
In planar pursuit-evasion differential games considering a faster pursuer and slower evader, the interception points resulting from equilibrium strategies lie on the Apollonius circle. This property is instrumental for leveraging geometric approaches for solving multiple pursuit-evasion scenarios in the plane. Here, we study a pursuit-evasion differential game on a sphere and generalize the planar Apollonius set to the spherical domain. We find that the interception point from the equilibrium strategies can leave the Apollonius set boundary and present a condition to keep the intercept point on the boundary. This condition allows for generalizing planar pursuitevasion strategies to the sphere.
This paper deals with the design of an Android mobile application and visualization of the measured values of particulate matter and meteorological factors from the measurement stations. The application can in princip...
This paper deals with the design of an Android mobile application and visualization of the measured values of particulate matter and meteorological factors from the measurement stations. The application can in principle be effectively used in any field of data visualization. The architecture of the mobile application, the security, the use of AWS services to access the data in the InfluxDB databases, as well as the user interface and graphical visualizations of the measured data are described and illustrated. The application is user tested and the paper documents their first experiences using the mobile application.
The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, followi...
详细信息
The study proposes and tests a technique for automated emotion recognition through mouth detection via Convolutional Neural Networks (CNN), meant to be applied for supporting people with health disorders with communic...
详细信息
This article reports examines the prerequisites for development and the expected results of the application of a new systematic approach to testing Industrial control System of Nuclear Power Plants (NPP ICS) in the fi...
详细信息
Obtaining models that can be used for flight control is of outmost importance to ensure reliable guidance and navigation of spacecrafts, like a Generic Parafoil Return Vehicle (GPRV). In this paper, we convert an exis...
详细信息
Obtaining models that can be used for flight control is of outmost importance to ensure reliable guidance and navigation of spacecrafts, like a Generic Parafoil Return Vehicle (GPRV). In this paper, we convert an existing, high-fidelity nonlinear model of the atmospheric flight dynamics of a GPRV to a Linear Parameter-Varying (LPV) form that enables high-performance navigation control design. Application of existing systematic conversion methods for such complicated nonlinear models often result in complex LPV representations, which are not suitable for controller synthesis. We apply and compare state-of-the-art conversion techniques on the GPRV model, including learning based approaches, to optimize the complexity and conservatism of the resulting LPV embedding. The results show that we can obtain an LPV embedding that approximates the complex nonlinear dynamics sufficiently well, where the balance between complexity, conservatism and model performance is efficiently chosen.
This paper aims at demonstrating how and that model predictive control (MPC) strategies can be used to determine optimal intervention policies against the COVID-19 pandemic. Especially for the time after a first wave ...
详细信息
This paper aims at demonstrating how and that model predictive control (MPC) strategies can be used to determine optimal intervention policies against the COVID-19 pandemic. Especially for the time after a first wave of infection and before a vaccine can be safely distributed to a sufficient extent, the intervention experience from the first outbreak can be utilized to guide the policy decision in this period. The MPC problem in this paper takes the pandemic in different regions of a country and its neighboring countries into account, while policies such as wearing masks or social distancing are selected as inputs to be optimized. This optimized policy balances the risk of a second outbreak and socio-economic costs, while considering that the measure should not be too severe to be rejected by the population. Effectiveness of this policy compared to standard intervention policies is compared through numerical simulations.
Reinforcement learning yields a feedback controller that achieves specific control goal (which is often translated as a reward function). However, it often suffers from the Sim2Real gap, and domain randomization is kn...
详细信息
Chest radiography presents one of the main medical imaging modalities for diagnosing lung diseases. To assist radiologists during interventional procedures, this paper aims at proposing a transfer learning-based class...
详细信息
暂无评论