检索结果-内蒙古大学图书馆

A dynamic flexible job shop scheduling method based on collaborative agent reinforcement learning

FLEXIBLE SERVICES AND MANUFACTURING JOURNAL 2024年 1-33页

作者： Shao, Changshun Yu, Zhenglin Ding, Hongchang Cao, Guohua Ding, Kaifang Duan, Jingsong Changchun Univ Sci & Technol Coll Mech & Elect Engn Changchun 130022 Jilin Peoples R China Changchun Univ Sci & Technol Chongqing Res Inst Chongqing 401135 Peoples R China

This paper presents an innovative approach to solve the Dynamic Flexible Job Shop Scheduling Problem (DFJSP). Our method aims to enhance production efficiency by minimizing the average total tardiness. To achieve this goal, we first construct a model that accurately describes the environmental state using eight feature values ranging from 0 to 1. These feature values comprehensively consider factors such as job progress and machine utilization, thus providing a full reflection of the actual operational status of the workshop. Additionally, we design four composite scheduling rules that address the dual selection issues of machines and jobs in the DFJSPs, enabling optimal decisions to be made within each scheduling cycle. We design a reward function based on changes in the state values, ensuring that the number of reward values increases as the training progresses, alleviating the problem of sparse rewards to some extent. This design facilitates faster learning and improves the convergence speed of the algorithm. In terms of agent design, we employ a Deep Q-Network and a proximal policy optimization algorithms. Both methods effectively handle complex decision spaces and exhibit good stability during training. Through a data-sharing approach, we further develop a collaborative agent model that enables efficient cooperation among multiple agents. Finally, we validate the effectiveness of our proposed model through a series of experiments. The results demonstrate that our model shows advantages, regardless of whether the dataset is small or large. When compared with other methods, our approach maintains high performance across different problem instances. These results fully validate the correctness and effectiveness of our design approach and provide a viable solution for scheduling problems in practical industrial settings.

关键词： Dynamic flexible job shop scheduling problem proximal policy optimization algorithm Deep Q-network Collaborative

来源：评论

学校读者我要写书评

暂无评论

System-Level Predictive Maintenance optimization for No-Wait Production Machine-Robot Collaborative Environment under Economic Dependency and Hybrid Fault Mode

引用

PROCESSES 2024年第8期12卷 1690页

作者： Hu, Bing Chen, Zhaoxiang Zhen, Mengzi Chen, Zhen Pan, Ershun Shanghai Jiao Tong Univ Dept Ind Engn & Management State Key Lab Mech Syst & Vibrat Shanghai 200240 Peoples R China Shanghai Baosight Software Co Ltd Shanghai 201203 Peoples R China

For manufacturing systems such as hot rolling, where there is no wait in the production process, breaks between adjacent production batches provide "opportunities" for predictive maintenance. With the extensive application of industrial robots, a production machine-robot collaboration mode should be considered in system-level predictive maintenance. The hybrid failure mode of machines and dependencies among machines further elevate the difficulty of developing predictive maintenance schedules. Therefore, a novel system-level predictive maintenance method for the no-wait production machine-robot collaborative maintenance problem (NWPMRCMP) is proposed. The machine-level predictive maintenance optimization model under hybrid failure mode, which consists of degradation and sudden failure, is constructed. Based on this, the system-level maintenance optimization model is developed, which takes into account the economic dependency among machines. The maintenance model with the objective of minimizing the total cost is transformed into a Markov decision process (MDP), and a tailored proximal policy optimization algorithm is developed to solve the resulting MDP. Finally, a case study of a manufacturing system consisting of multiple hot-rolling machines and labeling robots is constructed to demonstrate the effectiveness of the proposed method. The results show that the designed algorithm has good performance and stability. Moreover, the developed strategy maximizes the performance of the machine and thus reduces the total maintenance cost.

关键词： predictive maintenance production machine-robot collaborative economic dependency hybrid fault mode proximal policy optimization algorithm

来源：评论

学校读者我要写书评

暂无评论

Research on Data-Driven Optimal Scheduling of Power System

引用

ENERGIES 2023年第6期16卷 2926-2926页

作者： Luo, Jianxun Zhang, Wei Wang, Hui Wei, Wenmiao He, Jinpeng Qilu Univ Technol Shandong Acad Sci Sch Informat & Automat Jinan Peoples R China Shandong Univ Dept Elect Engn Jinan 250061 Peoples R China Huazhong Univ Sci & Technol Automat Acad Wuhan 430074 Peoples R China

The uncertainty of output makes it difficult to effectively solve the economic security dispatching problem of the power grid when a high proportion of renewable energy generating units are integrated into the power grid. Based on the proximal policy optimization (PPO) algorithm, a safe and economical grid scheduling method is designed. First, constraints on the safe and economical operation of renewable energy power systems are defined. Then, the quintuple of Markov decision process is defined under the framework of deep reinforcement learning, and the dispatching optimization problem is transformed into Markov decision process. To solve the problem of low sample data utilization in online reinforcement learning strategies, a PPO optimization algorithm based on the Kullback-Leibler (KL) divergence penalty factor and importance sampling technique is proposed, which transforms on-policy into off-policy and improves sample utilization. Finally, the simulation analysis of the example shows that in a power system with a high proportion of renewable energy generating units connected to the grid, the proposed scheduling strategy can meet the load demand under different load trends. In the dispatch cycle with different renewable energy generation rates, renewable energy can be absorbed to the maximum extent to ensure the safe and economic operation of the grid.

关键词： grid dispatching optimization proximal policy optimization algorithm importance sampling deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Online Altitude Control and Scheduling policy for Minimizing AoI in UAV-Assisted IoT Wireless Networks

引用

IEEE TRANSACTIONS ON MOBILE COMPUTING 2022年第7期21卷 2493-2505页

作者： Samir, Moataz Assi, Chadi Sharafeddine, Sanaa Ghrayeb, Ali Concordia Univ Concordia Inst Informat Syst Engn CIISE Montreal PQ H3G 1M8 Canada Lebanese Amer Univ Sch Arts & Sci SAS Beirut 11022801 Lebanon Texas A&M Univ Qatar Elect & Comp Engn ECE Dept Doha 23874 Qatar

This article considers unmanned aerial vehicle (UAV) assisted Internet of Things (IoT) networks, where low resource IoT devices periodically sample a stochastic process and need to upload more recent information to a Base Station (BS). Among the myriad of applications, there is a need for timely delivery of data (for example, status-updates) before the data becomes outdated and loses its value. Since transmission capabilities of IoT devices are limited, it may not always be feasible to transmit over one hop transmission to the BS. To address this challenge, UAVs with virtual queues are deployed as middle layer between IoT devices and the BS to relay recent information over unreliable channels. In the absence of channel conditions, the optimal online scheduling policy is investigated as well as dynamic UAV altitude control that maintains a fresh status of information at the BS. The objective of this paper is to minimize the Expected Weighted Sum Age of Information (EWSA) for IoT devices. First, the problem is formulated as an optimization problem that is however generally hard to solve. Second, an online model free Deep Reinforcement Learning (DRL) is proposed, where the deployed UAV obtains instantaneous channel state information (CSI) in real time along with any adjustment to its deployment altitude. Third, we formulate the online problem as a Markov Decision Process (MDP) and proximal policy optimization (PPO) algorithm, which is a highly stable state-of-the-art DRL algorithm, is leveraged to solve the formulated problem. Finally, extensive simulations are conducted to verify findings and comprehensive comparisons with other baseline approaches are provided to demonstrate the effectiveness of the proposed design.

关键词： optimization Relays Internet of Things Trajectory Real-time systems Unmanned aerial vehicles Dynamic scheduling Mobile relays age of information scheduling policy UAV altitude control proximal policy optimization algorithm unknown channel conditions

来源：评论

学校读者我要写书评

暂无评论

AUV Dynamic Obstacle Avoidance Method Based on Improved PPO algorithm

引用

IEEE ACCESS 2022年 10卷 121340-121351页

作者： Zhu, Guohao Shen, Zhou Liu, Laiyuan Zhao, Sicong Ji, Fangzheng Ju, Zixia Sun, Jialong Jiangsu Ocean Univ Sch Geomat & Marine Informat Lianyungang 222001 Peoples R China Jiangsu Marine Resources Dev Res Inst Lianyungang 222005 Peoples R China Jiangsu Ocean Univ Coinnovat Ctr Jiangsu Marine Bioind Technol Lianyungang 222001 Peoples R China Jiangsu Ocean Univ Jiangsu Key Lab Marine Bioresources & Environm Jiangsu Key Lab Marine Biotechnol Lianyungang 222001 Peoples R China Minist Nat Resources Marine Informat Technol Innovat Ctr Tianjin 300171 Peoples R China

Designing a reasonable obstacle avoidance method for AUV 3D path planning is difficult, and existing obstacle avoidance methods have certain drawbacks. For example, they are only applicable to 2D planar applications and cannot effectively handle dynamic obstacles. To address these problems, we design an obstacle collision prediction model (CPM). Based on the results of the simulation of obstacles' inertial motion, the safety of the AUV navigation is evaluated to improve the model's sensitivity to dynamic obstacles. Then, we enhance the learning ability of the sequence sample data by combining it with a long short-term memory (LSTM) network, thus improving the training efficiency and effect of the algorithm. The trained proximal policy optimization (PPO) network can output reasonable actions in order to control the AUV to avoid obstacles, forming an AUV 3D dynamic obstacle avoidance strategy based on the CPM-LSTM-PPO algorithm. The simulation results show that the proposed algorithm has good generalization in uncertain environments. Moreover, it achieves dynamic AUV obstacle avoidance in different three-dimensional unknown environments, providing theoretical and technical support for real path planning.

关键词： AUV dynamic obstacle avoidance deep reinforcement learning proximal policy optimization algorithm collision prediction model

来源：评论

学校读者我要写书评

暂无评论

Research on 3D Observation Path Planning Method for Mobile Platforms Based on Near-End Strategy optimization

Research on 3D Observation Path Planning Method for Mobile P...

引用

2024 International Conference on Guidance, Navigation and Control

作者： Zhang, Jing Jing Dong, Peng Da Shi, Wen Liu, Xin Yu Yu, Cong Rui Harbin Engn Univ Coll Intelligent Syst Sci & Engn Harbin 150000 Peoples R China Naval Equipment Dept Peoples Liberat Army Chinese Equipment Project Management Ctr Project Management Ctr Beijing 100000 Peoples R China China Ship Dev & Design Cente Underwater Part Hubei 430000 Peoples R China

ISBN: (纸本)9789819622542;9789819622528;9789819622511

It has been challenging for mobile observation platforms to solve the path planning problem in a three-dimensional dynamic marine environment. On the one hand, traditional path planning algorithms are highly dependent on the environment, lack flexibility, and need to be re-modeled and re-planned when the environment changes. On the other hand, traditional algorithms suffer from the problems of difficult modeling, local optimality, and reduced observation efficiency when facing path planning in marine environments. To solve these problems, we introduce 3D ocean information under Princeton Ocean Model (POM) for path planning to improve the robustness of the model to 3D dynamic ocean environment. Then, we enhance the learning ability by combining Recurrent Neural Network (RNN) with proximal policy optimization (PPO) algorithm in order to improve the training efficiency and effectiveness of the algorithm. After training, the observation path of the mobile observation platform can be reasonably planned, forming the three-dimensional dynamic marine environment path planning for the mobile observation platform based on the POM-RNN-PPO algorithm. Simulation results show that the algorithm shows better observation path planning results than other algorithms in three-dimensional dynamic ocean environment, and has good generalization under different sea areas, which provides theoretical and technical support for real mobile observation platform path planning.

关键词： Deep reinforcement learning Marine environmental observation Path planning proximal policy optimization algorithm Recurrent neural network

来源：评论

学校读者我要写书评

暂无评论

Dynamic flexible job shop scheduling algorithm based on deep reinforcement learning 35

Dynamic flexible job shop scheduling algorithm based on deep...

引用

35th Chinese Control and Decision Conference (CCDC)

作者： Zhao, Tianrui Wang, Yanhong Tan, Yuanyuan Zhang, Jun Shenyang Univ Technol Coll Artificial Intelligence Shenyang 145558 Peoples R China Shenyang Univ Technol Coll Artificial Intelligence Shenyang 12326 Peoples R China Shenyang Univ Technol Coll Artificial Intelligence Shenyang 52429 Peoples R China

ISBN: (纸本)9798350334722

The dynamic scheduling problem is a hot topic of current research. To solve the dynamic flexible job shop scheduling problem, an improved composite scheduling rule algorithm based on proximal policy optimization is proposed with the objective of minimizing the total delay time. The algorithm uses seven state features to represent the scheduling environment and designs six custom composite scheduling rules and a reward function. The algorithm is able to continuously interact with the environment to accumulate data and update the neural network parameters using Adam's algorithm through offline learning. Simulation results show that the algorithm can achieve better performance metrics by using a combination of single scheduling rules, and the results are better compared to classical scheduling algorithms.

关键词： proximal policy optimization algorithm deep reinforcement learning dispatching rules flexible job shop scheduling

来源：评论

学校读者我要写书评

暂无评论

A Futures Quantitative Trading Strategy Based on a Deep Reinforcement Learning algorithm 8

A Futures Quantitative Trading Strategy Based on a Deep Rein...

引用

IEEE 8th International Conference on Big Data Analytics (ICBDA)

作者： Chen, Xuemei Guo, Haoran Wuhan Univ Sch Informat Management Wuhan Peoples R China Shanghai Jiao Tong Univ Ningbo Inst Artificial Intelligence Ningbo Peoples R China

ISBN: (纸本)9798350310764

Deep reinforcement learning (DRL) is a type of machine learning algorithm that has gained a lot of attention for its application in the financial field. Based on the proximal policy optimization algorithm (PPO) in deep reinforcement learning, this paper designs a trading strategy for the Chinese futures market, and realizes the end-to-end decision-making process from futures data to trading actions. Afterwards, using domestic rebar futures data, multiple historical data were selected for backtesting, and compared with traditional trading strategies. The results show that in the 12 selected test periods, 83.3% of the test periods are profitable, which is better than 33.3% of mean reversion (MR) and 25% of trend following (TF). It shows that the strategy proposed by us shows good adaptability when the futures market rises or falls compared with traditional methods, and can reduce losses through trading even when the market price changes significantly, thus increasing the return on investment.

关键词： machine learning proximal policy optimization algorithm deep reinforcement learning financial trading

来源：评论

学校读者我要写书评

暂无评论

Multi-device cooperative reactive power optimization control strategy for high percentage distributed photovoltaic distribution grid based on proximal strategy optimization algorithm 9

Multi-device cooperative reactive power optimization control...

引用

9th International Conference on Energy System, Electricity, and Power, ESEP 2024

作者： Zhang, Liyuan Ren, Guitian Chen, Shangyue Zhang, Xiaotong Chengxi Power Supply Branch of State Grid Tianjin Electric Power Company Tianjin 300190 China

ISBN: (纸本)9781510691728

A high proportion of strongly intermittent distributed PV is connected to the distribution network in a decentralized and disordered manner, which leads to the difficulty of its reactive power balance. This paper proposes a multi-device collaborative reactive power optimization control method based on proximal strategy optimization, which models the voltage reactive power optimization problem of high-ratio distributed PV distribution grids as a partially observable Markov decision-making process, and then establishes a method of generating and extracting training samples under the framework of deep reinforcement learning, and uses proximal strategy. The algorithm dynamically controls the update amplitude of the network through the shear strategy gradient method and improves the sample utilization through importance sampling, which has good convergence when dealing with continuous action space and high-dimensional state space. Finally, arithmetic simulations and analysis are performed based on the IEEE 33-bus system. The results show that the algorithm learns a policy network that can solve the global optimal solution based on the electrical information of the respective nodes. © 2025 SPIE.

关键词： distributed photovoltaic Distribution grid proximal policy optimization algorithm voltage reactive power control

来源：评论

学校读者我要写书评

暂无评论

Application of Deep Reinforcement Learning in Guandan Game 34

Application of Deep Reinforcement Learning in Guandan Game

引用

34th Chinese Control and Decision Conference (CCDC)

作者： Pan, Jiahong Zhang, Zhongtian Shen, Hengheng Zeng, Yi Wu, Lei Anhui Univ Sch Comp Sci & Technol Hefei 230601 Peoples R China

ISBN: (纸本)9781665478960

In recent years, imperfect information game has become an important touchstone to test the level of artificial intelligence. There are many imperfect information game scenarios in the real-world, such as economic transactions, military games, automatic driving. Therefore, the study of imperfect information game problems has very important practical significance. Guandan is a type of imperfect information card game with four players which are divided into two teams. The mass hidden information in the Guandan game leads to a high-dimensional game state. Reinforcement learning algorithm has efficient ability in strategy search of computer games. But it cannot converge under the condition of imperfect information and high-dimensional state space which caused by Guandan Game. According to these problems, this paper introduces the proximal policy optimization (PPO) algorithm based on deep reinforcement learning to solve the problem of imperfect information, high-dimensional state space, and action space. It enables the agent to perceive high-dimensional information and makes decisions according to the acquisition inthrmation. The experiment result shows that the decision model based on the proximal policy optimization algorithm is better than the intelligence level of the policy Gradient algorithm and A2C algorithm, which proves that the system has a self-learning, ability to improve the game level of Guandan.

关键词： Imperfect Information Game Guandan Deep Reinforcement Learning proximal policy optimization algorithm Self-Learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：