The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-load...
详细信息
The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.
Manufacturing service (MS) collaboration promotes the social collaboration of distributed enterprises, which makes profits through manufacturing resource sharing on platforms. Therefore, the pricing strategy for MSs w...
详细信息
Manufacturing service (MS) collaboration promotes the social collaboration of distributed enterprises, which makes profits through manufacturing resource sharing on platforms. Therefore, the pricing strategy for MSs will affect the collaboration results, and the satisfaction of enterprises with the platform further. Hence, in this article, a personalized dynamic pricing based MS collaboration optimization method is proposed. First, due to the poor information of the enterprise preference, long-term and short-term preferences of enterprises are estimated based on scarcity, including service features, service quantity, and available time. Then, the personalized pricing method is proposed to adapt to the dynamic collaboration process. And the consumer utility model that considers time decay and price changes is constructed, which reflects the utility characteristics of consumers in the actual collaboration process, such as the utility decrease and deceleration rate increase with time consuming. Finally, q-learning algorithm based MS collaboration optimization verifies the effectiveness and superiority of the method.
This paper concentrates on the optimal denial-of-service (DoS) attack power allocation strategy for remote state estimation in cyber-physical systems with two-hop networks. An intelligent relay with the assumption tha...
详细信息
This paper concentrates on the optimal denial-of-service (DoS) attack power allocation strategy for remote state estimation in cyber-physical systems with two-hop networks. An intelligent relay with the assumption that it can conduct some simple recursive algorithms aims at transmitting the system process state across a vulnerable communication channel to the remote estimator, referred to as a cooperative system with the sensor. In the meantime, a malicious attacker disrupts the transmission on the channel strategically to deteriorate the performance of the system, but can do this under an energy budget constraint over an infinite time horizon. The symbol error rate is introduced to characterize the probability of the error-free packet reception, while the channel noise and the interference from another communication channel are incorporated into the communication model. We present the optimal attack power control problem as a Markov decision process (MDP) framework by the fact that there exists at least one deterministic stationary strategy. The optimal power control strategy is derived with a stochastic predictive control formulation, and then we propose the corresponding strategy for implementation by borrowing extensively a q-learning algorithm. In addition, two quick time-saving and less computational sub-optimal attack power strategies are provided to disguise the attacker from detection. Finally, the theoretical results are illustrated by some numerical examples.
In order to solve the problem of cross-regional customized bus (CB) route planning during the COVID-19, we develop a CB route planning method based on an improved q-learning algorithm. First, we design a sub-regional ...
详细信息
In order to solve the problem of cross-regional customized bus (CB) route planning during the COVID-19, we develop a CB route planning method based on an improved q-learning algorithm. First, we design a sub-regional route planning approach considering commuters' time windows of pick-up stops and drop-off stops. Second, for the CB route with the optimal social total travel cost, we improve the traditional q-learning algorithm, including state-action pair, reward function and update rule of q value table. Then, a setup method of CB stops is designed and the path impedance function is constructed to obtain the optimal operating path between each of the two stops. Finally, we take three CB lines in Beijing as examples for numerical experiment, the theoretical and numerical results show that (i) compared with the current situation, although the actual operating cost of optimized route increases slightly, it is covered by the reduction of travel cost of passengers and the transmission risk of COVID-19 has also dropped significantly;(ii) the improved q-learning algorithm can solve the problem of data transmission lag effectively and reduce the social total travel cost obviously.
With the establishment of “carbon peaking and carbon neutrality” goals in China, along with the development of new power systems and ongoing electricity market reforms, pumped-storage power stations (PSPSs) will inc...
详细信息
With the establishment of “carbon peaking and carbon neutrality” goals in China, along with the development of new power systems and ongoing electricity market reforms, pumped-storage power stations (PSPSs) will increasingly play a significant role in power systems. Therefore, this study focuses on trading and bidding strategies for PSPSs in the electricity market. Firstly, a comprehensive framework for PSPSs participating in the electricity energy and frequency regulation (FR) ancillary service market is proposed. Subsequently, a two-layer trading model is developed to achieve joint clearing in the energy and frequency regulation markets. The upper-layer model aims to maximize the revenue of the power station by optimizing the bidding strategies using a q-learning algorithm. The lower-layer model minimized the total electricity purchasing cost of the system. Finally, the proposed bi-level trading model is validated by studying an actual case in which data are obtained from a provincial power system in China. The results indicate that through this decision-making method, PSPSs can achieve higher economic revenue in the market, which will provide a reference for the planning and operation of PSPSs.
Femtocells consisting of small femto base stations have emerged as an efficient solution for improving the capacity and coverage of wireless cellular networks. However, due to limited wireless radio resources, resourc...
详细信息
Femtocells consisting of small femto base stations have emerged as an efficient solution for improving the capacity and coverage of wireless cellular networks. However, due to limited wireless radio resources, resource allocation is a key issue in two-tier femtocell networks. Motivated by this challenge, in this paper, we propose a resource allocation approach which satisfies the quality of service requirements and maximizes social welfare. Users compete with each other for a serving base station that fulfills their quality of service requirements, and the serving base stations prefer to serve more users to make more revenue. We model the competition among these rational decision makers as the Vickrey-Clarke-Groves auction game theory in which each user as a buyer submits a bid for resources, and each base station as a seller decides which users will win the auction and how much the winning users should pay and then it assigns the resources to the winning users. Unlike the previous studies, we also take into account macro user's activity as cross-tier interference in the resource allocation process. We develop an algorithm based on q-learning in which each user gradually learns from its own past information and adjusts its bid value to achieve the Nash equilibrium as the solution of the game without any interaction with other users. We also investigate the existence and uniqueness of the Nash equilibrium. Simulation results verify the accuracy of the numerical results obtained from the proposed model.
The lifetime of a UAV-assisted wireless network is determined by the amount of energy consumed by the UAVs during flight, data collection, and transmission to the ground station. Routing protocols are commonly used fo...
详细信息
The lifetime of a UAV-assisted wireless network is determined by the amount of energy consumed by the UAVs during flight, data collection, and transmission to the ground station. Routing protocols are commonly used for data transmission in a communication network. However, because of the mobility of UAVs, using a routing protocol with a single communication technology results in higher delay and more energy consumption in a UAV-assisted wireless network. To overcome this, we propose two reinforcement learning (RL) algorithms, q-learning and deep q-network (DqN), for energy-efficient data transmission over a hybrid BLE/LTE/Wi-Fi/LoRa UAV-assisted wireless network. We consider BLE, LTE, Wi-Fi, and LoRa for communication over a UAV-GS link. The RL algorithms take any random network as input and learn the best policy to output the network with less energy consumption. The reward/penalty is chosen in such a way that the network with the highest energy consumption is penalized and the one with the lowest is rewarded, thereby minimizing total network energy consumption. Based on learning, it creates a hybrid BLE/LTE/Wi-Fi/LoRa UAV-assisted wireless network by assigning the best communication technology to a UAV-GS link. Further, we compare the performance of proposed RL algorithms with a rule-based algorithm and random hybrid scheme. In addition, we propose a theoretical framework for constructing hybrid network for both free space and free space multipath path loss models. We demonstrate the performance comparison of the proposed work with the conventional shortest path routing algorithm in terms of network energy consumption and average network delay using extensive results. Finally, the effect of the velocity of the UAV and the number of packets on the performance of the proposed framework is analyzed.
A reinforcement learning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) in this paper. A control oriented model of the HETV is first established, in which the state-o...
详细信息
A reinforcement learning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) in this paper. A control oriented model of the HETV is first established, in which the state-of-charge (SOC) of battery and the speed of generator are the state variables, and the engine's torque is the control variable. Subsequently, a transition probability matrix is learned from a specific driving schedule of the HETV. The proposed RLAEM decides appropriate power split between the battery and engine-generator set (EGS) to minimize the fuel consumption over different driving schedules. With the RLAEM, not only is driver's power requirement guaranteed, but also the fuel economy is improved as well. Finally, the RLAEM is compared with the stochastic dynamic programming (SDP)-based energy management for different driving schedules. The simulation results demonstrate the adaptability, optimality, and learning ability of the RLAEM and its capacity of reducing the computation time.
This paper proposes a novel framework for home energy management (HEM) based on reinforcement learning in achieving efficient home-based demand response (DR). The concerned hour-ahead energy consumption scheduling pro...
详细信息
This paper proposes a novel framework for home energy management (HEM) based on reinforcement learning in achieving efficient home-based demand response (DR). The concerned hour-ahead energy consumption scheduling problem is duly formulated as a finite Markov decision process (FMDP) with discrete time steps. To tackle this problem, a data-driven method based on neural network (NN) and q-learning algorithm is developed, which achieves superior performance on cost-effective schedules for HEM system. Specifically, real data of electricity price and solar photovoltaic (PV) generation are timely processed for uncertainty prediction by extreme learning machine (ELM) in the rolling time windows. The scheduling decisions of the household appliances and electric vehicles (EVs) can be subsequently obtained through the newly developed framework, of which the objective is dual, i.e., to minimize the electricity bill as well as the DR induced dissatisfaction. Simulations are performed on a residential house level with multiple home appliances, an EV and several PV panels. The test results demonstrate the effectiveness of the proposed data-driven based HEM framework.
In this article, a novel machine learning based data-driven pricing method is proposed for sharing rooftop photovoltaic (PV) generation and energy storage in an electrically interconnected residential building cluster...
详细信息
In this article, a novel machine learning based data-driven pricing method is proposed for sharing rooftop photovoltaic (PV) generation and energy storage in an electrically interconnected residential building cluster (RBC). In the studied problem, the energy sharing process is modeled by the leader-follower Stackelberg game where the owner of the rooftop PV system is responsible for pricing self-generated PV energy and operating ES devices. Meanwhile, local electricity consumers in the RBC choose their energy consumption with the given internal electricity prices. To track the stochastic rooftop PV panel outputs, the long short-term memory network based rolling-horizon prediction function is developed to dynamically predict future trends of PV generation. With system information, the predicted information is fed into a q-learning based decision-making process to find near-optimal pricing strategies. The simulation results verify the effectiveness of the proposed approach in solving energy sharing problems with partial or uncertain information.
暂无评论