检索结果-内蒙古大学图书馆

6Search: A reinforcement learning-based traceroute approach for efficient IPv6 topology discovery

COMPUTER NETWORKS 2023年第1期235卷

作者： Liu, Ning Jia, Chunbo Hou, Bingnan Hou, Changsheng Chen, Yingwen Cai, Zhiping Natl Univ Def Technol Coll Comp Changsha 410073 Hunan Peoples R China

Topology discovery can infer the interconnection relationship between network entities. A complete network topology is of great significance for network security analysis, application research, etc. However, due to the huge address space and uneven distribution of active addresses in the IPv6 Internet, it is infeasible to use brute-force traceroute to discover the entire topology. To address this problem, we propose 6Search, a target generation method based on reinforcement learning algorithm for IPv6 topology discovery. 6Search first obtains the routeable BGP prefixes and then carries out traceroute in each (/32) prefix. The number of probes allocated is dynamically adjusted based on the results of previous scans. Using the reinforcement learning algorithm, 6Search allocates more probes to prefixes with more address discovery in each scan iteration. Real world experiments demonstrate that 6Search has better performance in terms of discovery efficiency, which is 24.1%-139.8% improvement over the existing methods.

关键词： IPv6 Topology discovery reinforcement learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Automatic voltage control considering demand response: Approximatively completed observed Markov decision process-based reinforcement learning scheme

引用

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS 2024年 161卷

作者： Gu, Yaru Huang, Xueliang Southeast Univ Sch Elect Engn Nanjing Peoples R China

To fully utilize the voltage regulation capacity of flexible load and distributed generations (DGs), we propose a novel Approximatively Completed Observed Markov Decision Process-based (ACOMDP-based) reinforcement learning (RL) (namely, ACMRL) scheme for a multi-objective Automatic Voltage Control (AVC) problem considering Differential Increment Incentive Mechanism (DIIM)-based Incentive-Based Demand Response (IBDR). Firstly, we propose a DIIM to motivate high-flexibility consumers to achieve maximum potential in realtime voltage control while ensuring the best economy. Secondly, we characterize the multi-objective AVC problem as an ACOMDP model, transformed from the Partially Observable Markov Decision Process (POMDP) model, by introducing a novel hidden system state vector that incorporates the belief state, and the high confidence probability vector. The belief state and the high-confidence probability vector describe the probability distribution extracted from the historical observed state, portraying the precise state and the uncertainty existing in the state update process. Then, the ACOMDP block is inputted into the RL block, which adopts a modified underlying network architecture with the Asynchronous Advantage Actor-Critic (MA3C) algorithm embedded with the Shared Modular Policies(SMP) module. The MA3C-based RL block, characterized by enhanced communication efficiency, enables expedited generation of optimal decision-making actions even in the face of substantial uncertainty. Case studies are conducted in a practical district in Suzhou, China, and simulation results validate the superior performance of the proposed methodology.

关键词： Automatic voltage control Partially observable system Uncertainty Differential increment incentive mechanism reinforcement learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Burden Control Strategy Based on reinforcement learning for Gas Utilization Rate in Blast Furnace ⁎

引用

IFAC-PapersOnLine 2020年第2期53卷 11704-11709页

作者： Xiaoling Shen Jianqi An Min Wu Jinhua She School of Automation China University of Geosciences Wuhan 430074 China Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems Wuhan 430074 China School of Engineering Tokyo University of Technology HachiojiTokyo 192-0982 Japan

Gas utilization rate (GUR) is an important state parameter to reflect the energy consumption, the quality and production of the pig iron, and the distribution of the gas flow in a blast furnace. The GUR is mainly adjusted by burden distribution and hot-blast supply. According to the analysis of mechanism and data, burden distribution and hot-blast supply affect the GUR on a long-time scale and short-time scale, respectively. However, few of the previous researches proposed the control method for the GUR and they did not consider multi-time-scale characteristics. Thus, it is necessary to design a control strategy or system for the GUR considering the multi-time-scale characteristics, which can make the GUR have a reasonable development trend. This paper presented a burden control strategy based on a reinforcement learning algorithm for the GUR. The method improved the development trend of the GUR on a long-time scale. The experimental results demonstrated that the sequence of the parameters of the burden distribution given by the presented method ensured a reasonable development trend of the GUR on a long-time scale.

关键词： Blast furnace gas utilization rate burden control strategy reinforcement learning algorithm long-time scale

来源：评论

学校读者我要写书评

暂无评论

Two-level decision-making model for a distribution company in day-ahead market

引用

IET GENERATION TRANSMISSION & DISTRIBUTION 2015年第12期9卷 1308-1315页

作者： Khazaei, Hossein Vahidi, Behrooz Hosseinian, Seyed Hossein Rastegar, Hasan Amirkabir Univ Technol Dept Elect Engn Tehran *** Iran

This study presents a two-level decision-making (TLDM) model for a distribution company (Disco) in the day-ahead market (DAM), where Disco has two additional resources, interruptible load (IL) and distribution generation (DG). At the upper level of the model, the competition among Discos for purchasing power from DAM is modelled using a matrix game with the assumption that the cost information of generators and Discos is common knowledge. In the lower level, each Disco's strategy on its ILs and DGs are derived through an optimisation problem. The TLDM model significantly reduces the size of the matrix game and thus lowers the computational barrier. Owing to implementation difficulties of mixed strategies, a reinforcement learning algorithm is used to derive Discos' strategies from the matrix game. This algorithm always provides pure strategies for Discos, even if the matrix game has no pure Nash equilibrium. An 8-bus system is used to illustrate the efficiency of the proposed model and solution method. The results are compared with those obtained using a bi-level optimisation method reported in the literature.

关键词： power distribution economics decision making distributed power generation matrix algebra game theory optimisation learning (artificial intelligence) power markets two-level decision-making model distribution company day-ahead market TLDM model DAM interruptible load IL distribution generation DG power purchasing matrix game Disco strategy reinforcement learning algorithm Nash equilibrium 8-bus system bilevel optimisation method

来源：评论

学校读者我要写书评

暂无评论

learning to construct a solution for UAV path planning problem with positioning error correction

引用

KNOWLEDGE-BASED SYSTEMS 2024年 304卷

作者： Chun, Jie Chen, Ming Liu, Xiaolu Xiang, Shang Du, Yonghao Wu, Guohua Xing, Lining Natl Univ Def Technol Coll Syst Engn Changsha 410073 Hunan Peoples R China XiangTan Univ Sch Publ Adm Xiangtan 411100 Hunan Peoples R China Cent South Univ Sch Automat Changsha 410075 Hunan Peoples R China Xidian Univ Coll Elect Engn Xian 710126 Shanxi Peoples R China

Unmanned aerial vehicles (UAVs) are advanced flight systems. However, their positioning systems cause distance-dependent errors during flight. This study seeks to solve the UAV path planning problem with positioning error correction (UPEC) with an end-to-end method. Traditional methods struggle to balance solution quality and computational overload, and often have limited utilisation of scenario information. To overcome these issues, we propose a path planning model (PPM) based on deep reinforcement learning to solve the UPEC. The model has a complete structure that includes a mathematical model, feature engineering, solution process, neural policy network, scenario generation, training process, and test solution mechanism. Specifically, we first establish a Markov decision process (MDP) for UPEC and apply feature engineering with effective features to support decision-making. We then introduce a path planning neural network (PPNN) to represent the MDP policy. Based on the dataset generated from the multi-rule combination validation, we train the PPNN using the proposed RL algorithm with storage pool. Furthermore, we propose a backtracking mechanism to guarantee solution feasibility during the construction process. Extensive experiments demonstrate that the proposed PPM outperforms existing state-of-the-art algorithms in terms of solution quality and timeliness, and the backtracking mechanism effectively improves the scenario completion rate. The model study indicates the efficacy of our training algorithm and the generalisation of the PPNN. Additionally, our construction process is problem-tailored and more suitable for addressing UPEC than iterative search algorithms, because it effectively mitigates the impact of invalid nodes.

关键词： Deep reinforcement learning UAV Path planning Positioning error correction reinforcement learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimized adaptive event-triggered tracking control for multi-agent systems with full-state constraints

引用

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 2022年第18期32卷 10101-10124页

作者： Yang, Xiaoyu Pan, Yingnan Sun, Jize Tan, Lihua Bohai Univ Coll Control Sci & Engn Jinzhou 121013 Liaoning Peoples R China Shenyang Aircraft Design & Res Inst Shenyang Liaoning Peoples R China Southwest Univ Coll Elect & Informat Engn Chongqing Peoples R China

In this article, the event-triggered optimized adaptive tracking control problem is investigated for a class of multi-agent systems subject to full-state constraints. To address the full-state constraints problem, a nonlinear mapping technique is applied, which can release the feasibility conditions for virtual controllers. Based on the mean-value theorem, the nonaffine nonlinear terms generated by transformed system are separated, which overcomes the obstacle of solving optimized solution. The neural network based reinforcement learning (RL) algorithm with the identifier-critic-actor architecture is introduced to obtain the optimal solution of systems with unknown dynamics. It is worth noting that the RL algorithm in this article is simplified, which can reduce the computational burden. To reducing the communication burden, an event-triggered mechanism with time-varying threshold related to the optimized control signal is developed. By applying the Lyapunov stability method, it is proved that the desired optimized tracking performance and the stability of the closed-loop systems can be guaranteed. Finally, a simulation example demonstrates that the proposed control strategy is effective.

关键词： event-triggered control full-state constraints multi-agent systems reinforcement learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Adaptive integral sliding mode control fault tolerant control for a class of uncertain nonlinear systems

引用

IET CONTROL THEORY AND APPLICATIONS 2018年第13期12卷 1864-1872页

作者： Li, Yuan-Xin Yang, Guang-Hong Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China Liaoning Univ Technol Coll Math Jinzhou 121001 Peoples R China Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China

This study considers the problem of adaptive sliding mode control for a class of uncertain non-linear systems with actuator faults and external disturbances. First, a novel reinforcement learning algorithm is first introduced to design the optimal controller of nominal control systems. Then, an integral-type sliding mode approach have been introduced to handle the difficulty caused by actuator failures and disturbance. It is shown that the actuator faults and disturbances can be compensated completely by the proposed controller, and all signals of the resulting closed-loop system are semi-global boundedness by choosing suitable parameters. The authors demonstrate the algorithm by applying it to two simulation examples.

关键词： uncertain systems variable structure systems adaptive control closed loop systems optimal control nonlinear control systems control system synthesis linear systems actuators fault tolerant control learning (artificial intelligence) actuator faults external disturbances reinforcement learning algorithm optimal controller nominal control systems adaptive integral sliding mode control uncertain nonlinear systems closed-loop system integral-type sliding mode fault-tolerant control

来源：评论

学校读者我要写书评

暂无评论

Context aware Q-learning-based model for decision support in the negotiation of energy contracts

引用

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS 2019年 104卷 489-501页

作者： Rodriguez-Fernandez, J. Pinto, T. Silva, F. Praca, I Vale, Z. Corchado, J. M. Polytech Porto ISEP IPP GECAD Res Grp Porto Portugal Univ Salamanca BISITE Res Grp Salamanca Spain Polytech Porto IPP Porto Portugal Osaka Inst Technol Osaka Japan

Automated negotiation plays a crucial role in the decision support for bilateral energy transactions. In fact, an adequate analysis of past actions of opposing negotiators can improve the decision-making process of market players, allowing them to choose the most appropriate parties to negotiate with in order to increase their outcomes. This paper proposes a new model to estimate the expected prices that can be achieved in bilateral contracts under a specific context, enabling adequate risk management in the negotiation process. The proposed approach is based on an adaptation of the Q-learning reinforcement learning algorithm to choose the best scenario (set of forecast contract prices) from a set of possible scenarios that are determined using several forecasting and estimation methods. The learning process assesses the probability of occurrence of each scenario, by comparing each expected scenario with the real scenario. The final chosen scenario is the one that presents the higher expected utility value. Besides, the learning method can determine which is the best scenario for each context, since the behaviour of players can change according to the negotiation environment. Consequently, these conditions influence the final contract price of negotiations. This approach allows the supported player to be prepared for the negotiation scenario that is the most probable to represent a reliable approximation of the actual negotiation environment.

关键词： Automated negotiation Bilateral contracts Context awareness Decision support Electricity markets reinforcement learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Adaptive optimisation of timeout policy for dynamic power management based on semi-Markov control processes

引用

IET CONTROL THEORY AND APPLICATIONS 2010年第10期4卷 1945-1958页

作者： Jiang, Q. Xi, H. -S. Yin, B. -Q. Hefei Univ Technol Sch Elect Engn & Automat Hefei 230009 Peoples R China Univ Sci & Technol China Dept Automat Hefei 230027 Peoples R China

Timeout policy is an industry standard for dynamic power management (DPM), and thus is easy and safe to implement in many power-managed systems. The optimisation of timeout policy suffered from the lack of effective analytical model and fell in heuristic previously. This study presents an adaptive optimisation method for timeout DPM policy. First, a semi-Markov control processes model is introduced to formulate the DPM problem of finding timeout policies that minimise power consumption under performance constraints. Under this framework, the equivalence of timeout and stochastic policies on power-performance tradeoff is revealed, and the equivalent relation between these two types of DPM policy is derived. Then, a reinforcement learning algorithm that combines policy gradient estimate and stochastic approximation is proposed for optimising timeout policy online. This algorithm does not depend on any prior knowledge of system parameters, and can achieve a global optimum with less computational cost. Simulation results demonstrate the analytical results and the effectiveness of the proposed algorithm.

关键词： policy gradient estimation Optimisation techniques Power system management, operation and economics adaptive control Control of electric power systems power-performance tradeoff semi-Markov control process power system control gradient methods performance constraints adaptive optimisation stochastic approximation reinforcement learning algorithm Self-adjusting control systems Interpolation and function approximation (numerical analysis) dynamic power management approximation theory power system management Markov processes optimisation learning (artificial intelligence) Knowledge engineering techniques timeout policy

来源：评论

学校读者我要写书评

暂无评论

Design of a Digital Exhibition Service System Under the Deep Belief Network Models

引用

IEEE ACCESS 2024年 12卷 108786-108796页

作者： Song, Qixin Tourism Coll Changchun Univ Sch Tourism & Culture Changchun 130607 Peoples R China Jilin Prov Res Ctr Cultural Tourism Educ & Enterp Changchun 130607 Peoples R China Northeast Asia Res Ctr Leisure Econ Changchun 130607 Peoples R China

This work aims to optimize the classification efficiency of the digital exhibition service system and achieve optimization of booth layout and visitor route planning. This work combines the Deep Belief Network (DBN) model with reinforcement learning (RL) algorithms and Random Forest (RF) algorithms to design and construct a digital exhibition service system. This work utilizes publicly available exhibition promotion and display channels, selecting four common types of exhibitions such as industry exhibitions. Each type chooses 10 different time periods and content formats of exhibition data, which are scattered and arranged into five exhibition datasets. The work introduces the RF algorithm as an auxiliary classifier, extracts the features through the DBN model, and uses quantitative indicators to evaluate the robustness of the model and the accuracy and personalization of the recommended results. Meanwhile, the learned features and patterns are input into the RL algorithm to verify the system's decision optimization effect. Results demonstrate that 1) Under perturbed data conditions, the system's accuracy average differs by only 0.4% from the original data conditions;the average F1 score differs by 0.003;and the average recall rate differs by only 0.1%. This indicates that the system exhibits good robustness when facing perturbed environments. 2) Cross-validation results show that the system maintains stable classification efficiency across different folds, with accuracy ranging from 82% to 89%. The average time consumption for each fold does not exceed 10ms, indicating that the system can efficiently classify different types of exhibition data. 3) Variance analysis results show that the p-values corresponding to five indicators-recommendation accuracy, recommendation coverage rate, personalized recommendation effect score, recommendation click-through rate, and user satisfaction-are 0.036, 0.027, 0.037, 0.046, and 0.039, respectively. They are all less than 0.05,

关键词： Biological system modeling Accuracy Data models Classification algorithms Industries Radio frequency Random forests Digital systems Decision making Optimization methods Deep belief network model reinforcement learning algorithm random forest algorithm digital exhibition service system decision optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：