与地面无线通信系统相比,卫星通信的广域覆盖特性使得信息安全传输问题成为该领域更具挑战性的研究课题.为了提升多播传输模式下卫星通信系统的物理层安全性能,本文针对不同信道状态信息(Channel State In⁃formation,CSI)研究了两种安...
详细信息
与地面无线通信系统相比,卫星通信的广域覆盖特性使得信息安全传输问题成为该领域更具挑战性的研究课题.为了提升多播传输模式下卫星通信系统的物理层安全性能,本文针对不同信道状态信息(Channel State In⁃formation,CSI)研究了两种安全波束成形(Beamforming,BF)算法.在合法用户和窃听者CSI均准确已知的条件下,提出了基于半正定规划(Semidefinite Program,SDP)和惩罚函数相结合的安全BF算法;在合法用户CSI准确已知但窃听者CSI存在误差的条件下,提出了一种迭代的鲁棒安全BF算法.最后,计算机仿真不仅验证了本文所提BF算法的正确性和有效性,而且展示了所提出的鲁棒算法能够有效地降低信道信息误差对系统安全性能的影响.
为了解决基于平均场的多智能体强化学习(M3-UCRL)算法中的环境动力学模型对下一时刻状态预测不精确和策略学习样本过少的问题。本文利用了去噪概率扩散模型(Denoising Diffusion Probabilistic Models, DDPM)的数据生成能力,提出了一种基于DDPM的平均场多智能体强化学习(DDPM-M3RL)算法。该算法将环境模型的生成表述为去噪问题,利用DDPM算法,提高了环境模型对下一时刻状态预测的精确度,也为后续的策略学习提供了充足的样本数据,提高了策略模型的收敛速度。实验结果表明,该算法可以有效提高环境动力学模型对下一时刻状态预测的精确度,根据环境动力学模型生成的状态转移数据可以为策略学习提供充足的学习样本,有效提高了导航策略的性能和稳定性。To solve the problems of inaccurate prediction of the next state by the environment dynamics model and too few samples for policy learning in the mean field based multi-agent reinforcement learning (M3-UCRL) algorithm, this paper takes advantage of the data generation capability of denoising diffusion probability models (DDPM) and proposes a mean field multi-agent reinforcement learning (DDPM-M3RL) algorithm based on DDPM. The algorithm formulates the generation of the environment model as a denoising problem. By using the DDPM algorithm, the accuracy of the environment model’s prediction of the next state is improved, and sufficient sample data is provided for subsequent policy learning, which improves the convergence speed of the policy model. Experimental results show that the algorithm can effectively improve the accuracy of the environment dynamics model’s prediction of the next state, and the state transition data generated by the environment dynamics model can provide sufficient learning samples for policy learning, which effectively improves the performance and stability of the navigation strategy.
暂无评论