检索结果-内蒙古大学图书馆

A policy-based Monte Carlo tree search method for container pre-marshalling

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 2024年第13期62卷 4776-4792页

作者： Wang, Ziliang Zhou, Chenhao Che, Ada Gao, Jingkun Northwestern Polytech Univ Sch Management Xian Peoples R China Northwest Elect Power Design Inst Co Ltd Xian Peoples R China Northwestern Polytech Univ Sch Management Xian 710072 Peoples R China

The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.

关键词： Container pre-marshalling problem Monte Carlo tree search Markov decision process q-learning algorithm Automated container terminal

来源：评论

学校读者我要写书评

暂无评论

Long-/Short-Term Preference Based Dynamic Pricing and Manufacturing Service Collaboration Optimization

引用

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2022年第12期18卷 8948-8956页

作者： Zhang, Yongping Cheng, Ying Zheng, Haitao Tao, Fei Beihang Univ Sch Econ & Management Beijing 100191 Peoples R China Beihang Univ Sch Automat Sci & Elect Engn Beijing 100191 Peoples R China

Manufacturing service (MS) collaboration promotes the social collaboration of distributed enterprises, which makes profits through manufacturing resource sharing on platforms. Therefore, the pricing strategy for MSs will affect the collaboration results, and the satisfaction of enterprises with the platform further. Hence, in this article, a personalized dynamic pricing based MS collaboration optimization method is proposed. First, due to the poor information of the enterprise preference, long-term and short-term preferences of enterprises are estimated based on scarcity, including service features, service quantity, and available time. Then, the personalized pricing method is proposed to adapt to the dynamic collaboration process. And the consumer utility model that considers time decay and price changes is constructed, which reflects the utility characteristics of consumers in the actual collaboration process, such as the utility decrease and deceleration rate increase with time consuming. Finally, q-learning algorithm based MS collaboration optimization verifies the effectiveness and superiority of the method.

关键词： Manufacturing Collaboration Pricing Task analysis Optimization Costs Supply and demand Dynamic pricing method long-term short-term preference manufacturing service (MS) collaboration q-learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimal Denial-of-Service Attack Power Allocation Strategy for Remote State Estimation in CPSs With Two-Hop Networks

IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING

引用

IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING 2023年第4期7卷 1597-1606页

作者： Xing, Wei Zhao, Xudong Liu, Lishuang Dalian Univ Technol Key Lab Intelligent Control & Optimizat Ind Equipm Minist Educ Dalian 116024 Peoples R China

This paper concentrates on the optimal denial-of-service (DoS) attack power allocation strategy for remote state estimation in cyber-physical systems with two-hop networks. An intelligent relay with the assumption that it can conduct some simple recursive algorithms aims at transmitting the system process state across a vulnerable communication channel to the remote estimator, referred to as a cooperative system with the sensor. In the meantime, a malicious attacker disrupts the transmission on the channel strategically to deteriorate the performance of the system, but can do this under an energy budget constraint over an infinite time horizon. The symbol error rate is introduced to characterize the probability of the error-free packet reception, while the channel noise and the interference from another communication channel are incorporated into the communication model. We present the optimal attack power control problem as a Markov decision process (MDP) framework by the fact that there exists at least one deterministic stationary strategy. The optimal power control strategy is derived with a stochastic predictive control formulation, and then we propose the corresponding strategy for implementation by borrowing extensively a q-learning algorithm. In addition, two quick time-saving and less computational sub-optimal attack power strategies are provided to disguise the attacker from detection. Finally, the theoretical results are illustrated by some numerical examples.

关键词： Relays Wireless sensor networks Wireless communication State estimation Schedules Optimization Cyber-physical systems Denial-of-service attack Energy conservation Markov processes q-learning DoS attack energy constraint Markov decision process q-learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Cross-Regional Customized Bus Route Planning Considering Staggered Commuting During the COVID-19

引用

IEEE ACCESS 2021年 9卷 20208-20222页

作者： Wang, Ange Guan, Hongzhi Wang, Pengfei Peng, Liqun Xue, Yunqiang Beijing Univ Technol Fac Urban Construct Beijing 100124 Peoples R China Beijing Univ Technol Minist Educ Key Lab Urban Secur & Disaster Engn Beijing 100124 Peoples R China Hebei Normal Univ Sci & Technol Coll Urban Construct Qinhuangdao 066004 Hebei Peoples R China East China Jiaotong Univ Sch Transportat & Logist Nanchang 330013 Jiangxi Peoples R China

In order to solve the problem of cross-regional customized bus (CB) route planning during the COVID-19, we develop a CB route planning method based on an improved q-learning algorithm. First, we design a sub-regional route planning approach considering commuters' time windows of pick-up stops and drop-off stops. Second, for the CB route with the optimal social total travel cost, we improve the traditional q-learning algorithm, including state-action pair, reward function and update rule of q value table. Then, a setup method of CB stops is designed and the path impedance function is constructed to obtain the optimal operating path between each of the two stops. Finally, we take three CB lines in Beijing as examples for numerical experiment, the theoretical and numerical results show that (i) compared with the current situation, although the actual operating cost of optimized route increases slightly, it is covered by the reduction of travel cost of passengers and the transmission risk of COVID-19 has also dropped significantly;(ii) the improved q-learning algorithm can solve the problem of data transmission lag effectively and reduce the social total travel cost obviously.

关键词： Planning Heuristic algorithms Windows Automobiles Roads Public transportation COVID-19 Customized bus route planning reinforcement learning q-learning algorithm time window

来源：评论

学校读者我要写书评

暂无评论

Decision-making Method for Pumped Storage Power Stations in the Electricity Energy and Frequency Regulation Markets

引用

Chinese Journal of Electrical Engineering 2024年第4期10卷 60-72页

作者： Man Chen Hongtao Zhu Yumin Peng Xuan Wang Xuefeng Zhang Yijun Xiong Lianfu Chen Yikai Li Bushi Zhao CSG PGC Power Storage Research Institute Guangzhou 510635China Sichuan Energy Internet Research Institute Tsinghua UniversityChengdu 610213China

With the establishment of “carbon peaking and carbon neutrality” goals in China, along with the development of new power systems and ongoing electricity market reforms, pumped-storage power stations (PSPSs) will increasingly play a significant role in power systems. Therefore, this study focuses on trading and bidding strategies for PSPSs in the electricity market. Firstly, a comprehensive framework for PSPSs participating in the electricity energy and frequency regulation (FR) ancillary service market is proposed. Subsequently, a two-layer trading model is developed to achieve joint clearing in the energy and frequency regulation markets. The upper-layer model aims to maximize the revenue of the power station by optimizing the bidding strategies using a q-learning algorithm. The lower-layer model minimized the total electricity purchasing cost of the system. Finally, the proposed bi-level trading model is validated by studying an actual case in which data are obtained from a provincial power system in China. The results indicate that through this decision-making method, PSPSs can achieve higher economic revenue in the market, which will provide a reference for the planning and operation of PSPSs.

关键词： Pumped storage power station(PSPSs) electricity energy market frequency regulation market bidding strategy q-learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Dynamic resource allocation for OFDMA femtocell networks: a game-theoretic approach

引用

TELECOMMUNICATION SYSTEMS 2018年第1期69卷 51-59页

作者： Pourkabirian, Azadeh Fooladi, Mehdi Dehghan Takht Zeinali, Esmaeil Rahmani, Amir Masoud Islamic Azad Univ Dept Comp Engn Sci & Res Branch Tehran Iran Amirkabir Univ Technol Dept Comp Engn & Informat Technol Tehran Iran Islamic Azad Univ Dept Comp & Informat Technol Engn Qazvin Branch Qazvin Iran

Femtocells consisting of small femto base stations have emerged as an efficient solution for improving the capacity and coverage of wireless cellular networks. However, due to limited wireless radio resources, resource allocation is a key issue in two-tier femtocell networks. Motivated by this challenge, in this paper, we propose a resource allocation approach which satisfies the quality of service requirements and maximizes social welfare. Users compete with each other for a serving base station that fulfills their quality of service requirements, and the serving base stations prefer to serve more users to make more revenue. We model the competition among these rational decision makers as the Vickrey-Clarke-Groves auction game theory in which each user as a buyer submits a bid for resources, and each base station as a seller decides which users will win the auction and how much the winning users should pay and then it assigns the resources to the winning users. Unlike the previous studies, we also take into account macro user's activity as cross-tier interference in the resource allocation process. We develop an algorithm based on q-learning in which each user gradually learns from its own past information and adjusts its bid value to achieve the Nash equilibrium as the solution of the game without any interaction with other users. We also investigate the existence and uniqueness of the Nash equilibrium. Simulation results verify the accuracy of the numerical results obtained from the proposed model.

关键词： Resource allocation VCG auction Game theory quality of service q-learning algorithm Femtocell networks

来源：评论

学校读者我要写书评

暂无评论

RL-Based Energy-Efficient Data Transmission Over Hybrid BLE/LTE/Wi-Fi/LoRa UAV-Assisted Wireless Network

引用

IEEE-ACM TRANSACTIONS ON NETWORKING 2024年第3期32卷 1951-1966页

作者： Nelson, Wilson Ayyanthole Yeduri, Sreenivasa Reddy Jha, Ajit Kumar, Abhinav Cenkeramaddi, Linga Reddy Univ Agder ACPS Res Grp N-4879 Grimstad Norway Univ Agder Dept Informat & Commun Technol N-4879 Grimstad Norway Univ Agder Dept Engn Sci N-4879 Grimstad Norway Indian Inst Technol Hyderabad Dept Elect Engn Hyderabad 502285 Telangana India

The lifetime of a UAV-assisted wireless network is determined by the amount of energy consumed by the UAVs during flight, data collection, and transmission to the ground station. Routing protocols are commonly used for data transmission in a communication network. However, because of the mobility of UAVs, using a routing protocol with a single communication technology results in higher delay and more energy consumption in a UAV-assisted wireless network. To overcome this, we propose two reinforcement learning (RL) algorithms, q-learning and deep q-network (DqN), for energy-efficient data transmission over a hybrid BLE/LTE/Wi-Fi/LoRa UAV-assisted wireless network. We consider BLE, LTE, Wi-Fi, and LoRa for communication over a UAV-GS link. The RL algorithms take any random network as input and learn the best policy to output the network with less energy consumption. The reward/penalty is chosen in such a way that the network with the highest energy consumption is penalized and the one with the lowest is rewarded, thereby minimizing total network energy consumption. Based on learning, it creates a hybrid BLE/LTE/Wi-Fi/LoRa UAV-assisted wireless network by assigning the best communication technology to a UAV-GS link. Further, we compare the performance of proposed RL algorithms with a rule-based algorithm and random hybrid scheme. In addition, we propose a theoretical framework for constructing hybrid network for both free space and free space multipath path loss models. We demonstrate the performance comparison of the proposed work with the conventional shortest path routing algorithm in terms of network energy consumption and average network delay using extensive results. Finally, the effect of the velocity of the UAV and the number of packets on the performance of the proposed framework is analyzed.

关键词： UAV-assisted wireless networks BLE LTE Wi-Fi LoRa q-learning algorithm deep q-network free space model free space multipath model

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning of Adaptive Energy Management With Transition Probability for a Hybrid Electric Tracked Vehicle

引用

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2015年第12期62卷 7837-7846页

作者： Liu, Teng Zou, Yuan Liu, Dexing Sun, Fengchun Beijing Inst Technol Beijing Collaborat & Innovat Ctr Elect Vehicles Beijing 100081 Peoples R China Beijing Inst Technol Sch Mech Engn Beijing 100081 Peoples R China

A reinforcement learning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) in this paper. A control oriented model of the HETV is first established, in which the state-of-charge (SOC) of battery and the speed of generator are the state variables, and the engine's torque is the control variable. Subsequently, a transition probability matrix is learned from a specific driving schedule of the HETV. The proposed RLAEM decides appropriate power split between the battery and engine-generator set (EGS) to minimize the fuel consumption over different driving schedules. With the RLAEM, not only is driver's power requirement guaranteed, but also the fuel economy is improved as well. Finally, the RLAEM is compared with the stochastic dynamic programming (SDP)-based energy management for different driving schedules. The simulation results demonstrate the adaptability, optimality, and learning ability of the RLAEM and its capacity of reducing the computation time.

关键词： Adaptability energy management hybrid electric tracked vehicle (HETV) q-learning algorithm state of charge (SOC) stochastic dynamic programming (SDP)

来源：评论

学校读者我要写书评

暂无评论

A Multi-Agent Reinforcement learning-Based Data-Driven Method for Home Energy Management

引用

IEEE TRANSACTIONS ON SMART GRID 2020年第4期11卷 3201-3211页

作者： Xu, Xu Jia, Youwei Xu, Yan Xu, Zhao Chai, Songjian Lai, Chun Sing Southern Univ Sci & Technol Dept Elect & Elect Engn Shenzhen 518055 Peoples R China Nanyang Technol Univ Sch Elect & Elect Engn Singapore 639798 Singapore Hong Kong Polytech Univ Shenzhen Res Inst Hong Kong Peoples R China Hong Kong Polytech Univ Dept Elect Engn Hong Kong Peoples R China Brunel Univ London Dept Elect & Comp Engn London UB8 3PH England Guangdong Univ Technol Dept Elect Engn Guangzhou 510006 Peoples R China

This paper proposes a novel framework for home energy management (HEM) based on reinforcement learning in achieving efficient home-based demand response (DR). The concerned hour-ahead energy consumption scheduling problem is duly formulated as a finite Markov decision process (FMDP) with discrete time steps. To tackle this problem, a data-driven method based on neural network (NN) and q-learning algorithm is developed, which achieves superior performance on cost-effective schedules for HEM system. Specifically, real data of electricity price and solar photovoltaic (PV) generation are timely processed for uncertainty prediction by extreme learning machine (ELM) in the rolling time windows. The scheduling decisions of the household appliances and electric vehicles (EVs) can be subsequently obtained through the newly developed framework, of which the objective is dual, i.e., to minimize the electricity bill as well as the DR induced dissatisfaction. Simulations are performed on a residential house level with multiple home appliances, an EV and several PV panels. The test results demonstrate the effectiveness of the proposed data-driven based HEM framework.

关键词： Reinforcement learning data-driven method home energy management finite Markov decision process neural network q-learning algorithm demand response

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Game-Based Pricing for Sharing Rooftop Photovoltaic Generation and Energy Storage in the Residential Building Cluster Under Uncertainties

引用

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2021年第7期17卷 4480-4491页

作者： Xu, Xu Xu, Yan Wang, Ming-Hao Li, Jiayong Xu, Zhao Chai, Songjian He, Yufei Hong Kong Polytech Univ Shenzhen Res Inst Hong Kong 999077 Peoples R China Hong Kong Polytech Univ Dept Elect Engn Hong Kong 999077 Peoples R China Nanyang Technol Univ Sch Elect & Elect Engn Singapore 639798 Singapore Hunan Univ Coll Elect & Informat Engn Changsha 410082 Hunan Peoples R China

In this article, a novel machine learning based data-driven pricing method is proposed for sharing rooftop photovoltaic (PV) generation and energy storage in an electrically interconnected residential building cluster (RBC). In the studied problem, the energy sharing process is modeled by the leader-follower Stackelberg game where the owner of the rooftop PV system is responsible for pricing self-generated PV energy and operating ES devices. Meanwhile, local electricity consumers in the RBC choose their energy consumption with the given internal electricity prices. To track the stochastic rooftop PV panel outputs, the long short-term memory network based rolling-horizon prediction function is developed to dynamically predict future trends of PV generation. With system information, the predicted information is fed into a q-learning based decision-making process to find near-optimal pricing strategies. The simulation results verify the effectiveness of the proposed approach in solving energy sharing problems with partial or uncertain information.

关键词： Energy sharing energy storage (ES) long short-term memory (LSTM) network photovoltaic (PV) generation pricing method q-learning algorithm residential building cluster (RBC) Stackelberg game

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：