In this paper, we study the capacitated lot-sizing problem (CLSP) in a single-product system with stochastic demand, aiming to minimize the total production cost, which includes production, inventory, machine setup, a...
详细信息
In this paper, we study the capacitated lot-sizing problem (CLSP) in a single-product system with stochastic demand, aiming to minimize the total production cost, which includes production, inventory, machine setup, and stockout costs. We focus on the design and application of deep reinforcement learning (dRL) algorithms, proposing a reward shaping-basedd3qn algorithm. First, we provide a detailed problem description and formulate it as a reinforcement learning model using a Markov decision process. Then, we adopt the aggregate modified base-stock heuristic (AMBS) algorithm as the teacher strategy and apply Potential Function-based Reward Shaping for policy transfer, guiding the agent to mimic the teacher's behavior. Experimental results show that the RS-d3qn algorithm significantly outperforms other algorithms, with an average total cost 1.9% lower than the d3qn algorithm, 17.9% lower than the EOQ method, and 6.15% lower than the AMBS method. These results demonstrate that the RS-d3qn algorithm successfully combines the global optimization power of dRL with heuristic knowledge, addressing the challenges of low learning efficiency and slow convergence in complex production environments.
Aiming at the problem of data fluctuation in multi-process production, a Soft Update dueling double deep Q-learning (SU-d3qn) network combined with soft update strategy is proposed. Based on this, a time series combin...
详细信息
Aiming at the problem of data fluctuation in multi-process production, a Soft Update dueling double deep Q-learning (SU-d3qn) network combined with soft update strategy is proposed. Based on this, a time series combination forecasting model SU-d3qn-G is proposed. Firstly, based on production data, Gate Recurrent Unit (GRU) is used for prediction. Secondly, based on the model, SU-d3qn algorithm is used to learn and add bias to it, and the prediction results of GRU are corrected, so that the prediction value of each time node fits in the direction of reducing the absolute error. Thirdly, experiments were carried out on the dataset of a company. The data sets of four indicators, namely, the outlet temperature of drying silk, the loose moisture return water, the outlet temperature of feeding leaves and the inlet water of leaf silk warming and humidification, are selected, and more than 1000 real production data are divided into training set, inspection set and test set according to the ratio of 6:2:2. The experimental results show that the SU-d3qn-G combined time series prediction model has a great improvement compared with GRU, LSTM and ARIMA, and the MSE index is reduced by 0.846-23.930%, 5.132-36.920% and 10.606-70.714%, respectively. The RMSE index is reduced by 0.605-10.118%, 2.484-14.542% and 5.314-30.659%. The MAE index is reduced by 3.078-15.678%, 7.94-15.974% and 6.860-49.820%. The MAPE index is reduced by 3.098-15.700%, 7.98-16.395% and 7.143-50.000%.
Content delivery networks(CdNs) play a pivotal role in the modern internet infrastructure by enabling efficient content delivery across diverse geographical regions. As an essential component of CdNs, the edge caching...
详细信息
Content delivery networks(CdNs) play a pivotal role in the modern internet infrastructure by enabling efficient content delivery across diverse geographical regions. As an essential component of CdNs, the edge caching scheme directly influences the user experience by determining the caching and eviction of content on edge servers. With the emergence of 5G technology, traditional caching schemes have faced challenges in adapting to increasingly complex anddynamic network environments. Consequently, deep reinforcement learning(dRL) offers a promising solution for intelligent zero-touch network governance. However, the blackbox nature of dRL models poses challenges in understanding and making trusting decisions. In this paper,we propose an explainable reinforcement learning(XRL)-based intelligent edge service caching approach,namely XRL-SHAP-Cache, which combines dRL with an explainable artificial intelligence(XAI) technique for cache management in CdNs. Instead of focusing solely on achieving performance gains, this study introduces a novel paradigm for providing interpretable caching strategies, thereby establishing a foundation for future transparent and trustworthy edge caching solutions. Specifically, a multi-level cache scheduling framework for CdNs was formulated theoretically, with the d3qn-based caching scheme serving as the targeted interpretable model. Subsequently, by integrating deep-SHAP into our framework, the contribution of each state input feature to the agent's Q-value output was calculated, thereby providing valuable insights into the decision-making process. The proposed XRL-SHAP-Cache approach was evaluated through extensive experiments to demonstrate the behavior of the scheduling agent in the face of different environmental *** results demonstrate its strong explainability under various real-life scenarios while maintaining superior performance compared to traditional caching schemes in terms of cache hit ratio, quality of service(QoS),a
暂无评论