Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structur...
详细信息
Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structures are considered. But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functions that go beyond the simple static objective structures, which pose significant challenges of using existing bilevel solutions. To tackle this new class of bilevel problems, we introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation. We provide theoretical studies of the problem landscape and its penalty-based (policy) gradient algorithms. We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg game and RLHF. Copyright 2024 by the author(s)
It is needless to mention that the proven capability of cloud computing and digital twin-based monitoring and control systems will play a major role in the implementation of digital twin-based lithium-ion battery mana...
详细信息
Visual localization and object detection both play important roles in various *** many indoor application scenarios where some detected objects have fixed positions,the two techniques work closely ***,few researchers ...
详细信息
Visual localization and object detection both play important roles in various *** many indoor application scenarios where some detected objects have fixed positions,the two techniques work closely ***,few researchers consider these two tasks simultaneously,because of a lack of datasets and the little attention paid to such *** this paper,we explore multi-task network design and joint refinement of detection and *** address the dataset problem,we construct a medium indoor scene of an aviation exhibition hall through a semi-automatic *** dataset provides localization and detection information,and is publicly available at https://***/drive/folders/1U28zk0N4_I0db zkqyIAK1A15k9oUKOjI?usp=sharing for benchmarking localization and object detection *** this dataset,we have designed a multi-task network,JLDNet,based on YOLO v3,that outputs a target point cloud and object bounding *** dynamic environments,the detection branch also promotes the perception of *** includes image feature learning,point feature learning,feature fusion,detection construction,and point cloud ***,object-level bundle adjustment is used to further improve localization and detection *** test JLDNet and compare it to other methods,we have conducted experiments on 7 static scenes,our constructed dataset,and the dynamic TUM RGB-D and Bonn *** results show state-of-the-art accuracy for both tasks,and the benefit of jointly working on both tasks is demonstrated.
This paper introduces a design guidance for zero current detection (ZCD) circuit in Gallium-Nitride (GaN) device based critical conduction mode (CRM) pulse-width-modulation (PWM) converters to reduce the sensing delay...
详细信息
This paper presents an extended-conversion-ratio modulation for a two-phase symmetric series-capacitor buck (SSCB) converter. With the proposed scheme, highly efficient and regulated 48V-to-12V conversion can be reali...
详细信息
This paper presents a comprehensive survey of expert opinions on integrating semantic Artificial Intelligence (AI) in digital education, focusing on the conversion and composition of Learning Objects (LOs) to support ...
详细信息
Controlled islanding plays an essential role in preventing the blackout of power *** there are several studies on this topic in the past,no enough attention is paid to the uncertainty brought by renewable energy sourc...
详细信息
Controlled islanding plays an essential role in preventing the blackout of power *** there are several studies on this topic in the past,no enough attention is paid to the uncertainty brought by renewable energy sources(RESs)that may cause unpredictable unbalanced power and the observabilit>T of power systems after islanding that is essential for back-up black-start ***,a novel controlled islanding model based on mixed-integer second-order cone and chance-constrained programming(MISOCCP)is proposed to address these ***,the uncertainty of RESs is characterized by their possibility distribution models with chance constraints,and the requirements,e.g.,system observability,for rapid back-up black-start measures are also ***,a law of large numbers(LLN)based method is em-ployed for converting the chance constraints into deterministic ones and reformulating the non-convex model into convex ***,case studies on the revised IEEE 39-bus and 118-bus power systems as well as the comparisons among different models are given to demonstrate the effectiveness of the proposed *** results show that the proposed model can result in less unbalanced power and better observability after islanding compared with other models.
In this paper, a distributed super-twisting sliding mode protocol (DSTSM) is designed for achieving the formation and tracking of non-linear model of multiple Quadcopters using multiagent system (MAS) concept. The und...
详细信息
Recently,multimodal multiobjective optimization problems(MMOPs)have received increasing *** goal is to find a Pareto front and as many equivalent Pareto optimal solutions as *** some evolutionary algorithms for them h...
详细信息
Recently,multimodal multiobjective optimization problems(MMOPs)have received increasing *** goal is to find a Pareto front and as many equivalent Pareto optimal solutions as *** some evolutionary algorithms for them have been proposed,they mainly focus on the convergence rate in the decision space while ignoring solutions *** this paper,we propose a new multiobjective fireworks algorithm for them,which is able to balance exploitation and exploration in the decision *** first extend a latest single-objective fireworks algorithm to handle *** we make improvements by incorporating an adaptive strategy and special archive guidance into it,where special archives are established for each firework,and two strategies(i.e.,explosion and random strategies)are adaptively selected to update the positions of sparks generated by fireworks with the guidance of special ***,we compare the proposed algorithm with eight state-of-the-art multimodal multiobjective algorithms on all 22 MMOPs from CEC2019 and several imbalanced distance minimization *** results show that the proposed algorithm is superior to compared algorithms in solving ***,its runtime is less than its peers'.
This paper presents a distributed discrete-Time exponential sliding mode consensus (DDESMC) protocol for a class of multi-Agent systems. A leader-following approach of a homogenous multi-Agent system (MAS) is exploite...
详细信息
暂无评论