This paper studies the synchronization problem of two-player multiagent systems through reinforcement learning methods. A Nash-minmax strategy is formulated, where the interactions of two players in the same agent are...
详细信息
ISBN:
(纸本)9798350363029;9798350363012
This paper studies the synchronization problem of two-player multiagent systems through reinforcement learning methods. A Nash-minmax strategy is formulated, where the interactions of two players in the same agent are non-zerosum, while interactions of players between agents are zero-sum games. We propose an offline model-based reinforcement learning algorithm to identify Nash solutions for players within each agent, as well as the worst control solutions for players in neighboring antagonistic agents. On this basis, a data-driven off-policy algorithm is provided to alleviate the requirement for accurate system dynamics in the offline algorithm. Besides, the convergence of the proposed algorithms is analyzed. Finally, simulation results verify the effectiveness of the designed algorithms.
Reinforcement learning (RL) has made great progress in autonomous driving applications. However, using one RL based driving policy for multi-scenarios autonomous driving is still challenging for RL in autonomous drivi...
详细信息
ISBN:
(纸本)9798350321050
Reinforcement learning (RL) has made great progress in autonomous driving applications. However, using one RL based driving policy for multi-scenarios autonomous driving is still challenging for RL in autonomous driving. There are different observations and reward measurements in different scenarios. At the same time, there is also the problem of multi-source heterogeneous observation in autonomous driving. To address the problems above, we propose a reinforcement learning framework based on the auxiliary task. Firstly, we designed a reward function to enable vehicles to learn safe and efficient strategies. Further, an auxiliary task is designed to learn the characteristics of different scenarios so that the ego agent can adopt different strategies for different scenarios. Finally, in order to handle the driving problem in multiple scenarios, we propose a representation network based on Multi-layer perceptron (MLP), Convolutional neural network (CNN), and Transformer networks to learn multi-source heterogeneous observation. The multi-source heterogeneous observation consists of the ego vehicle state, the bird's eye view (BEV) state and neighbour vehicle states. Experiments show that our method achieves a higher success rate compared to a popular reinforcement learning algorithm.
The paper presents a novel approach for human-robot skill transferring. Firstly, we propose a method that combines dynamic time warping (DTW) with the Gaussian mixture model (GMM) to reconstruct the demonstrated skill...
详细信息
data-driven and machine-learning-based methods are increasingly used in attempts to master the challenges of the world. But are they really the best approaches to manage complex dynamical systems? Our aim is to gain m...
详细信息
data-driven and machine-learning-based methods are increasingly used in attempts to master the challenges of the world. But are they really the best approaches to manage complex dynamical systems? Our aim is to gain more insights into this question by studying various popular reinforcement learning methods for traffic signal control, namely in disrupted scenarios characterized by significant, unpredictable variations. The results are expected to be relevant in subject areas ranging from traffic physics to transportation theory, from dynamics in networks to complex systems, from control theory to self-organization, and from adaptive heuristics to machine learning.
To cope with the dynamic mission decision-making issue in complex environments for UAV swarm, a hybrid variable structure-based dynamic Bayesian network (HVSDBN) inference decision-making method is proposed. Firstly, ...
详细信息
ISBN:
(纸本)9798350321050
To cope with the dynamic mission decision-making issue in complex environments for UAV swarm, a hybrid variable structure-based dynamic Bayesian network (HVSDBN) inference decision-making method is proposed. Firstly, the UAV swarm mission decision-making model is established to assess the UAV swarm state and threat state accurately. To further improve the accuracy of decision-making, the threat assessment model and swarm state assessment model are built by using mixed continuous and discrete variables, respectively. Furthermore, a dynamic HVSDBN decision-making algorithm based on hybrid performance-capability parameters is proposed, which can adjust the structure of the decision model according to the priori information and observation data to improve the adaptability of the solution strategy. Simulation results demonstrate that, the HVSDBN method can im-prove the variance of decision results by 25.03% compared with traditional method, which effectively improves the accuracy of UAV swarm mission decision-making under complex dynamic environment.
In this paper, a novel contrastive representation learning framework for time series data is proposed. The framework is designed to learn general representations of time series at various semantic levels and is capabl...
详细信息
ISBN:
(纸本)9798350321050
In this paper, a novel contrastive representation learning framework for time series data is proposed. The framework is designed to learn general representations of time series at various semantic levels and is capable of transferring across different datasets. The framework incorporates two key components. Firstly, a hierarchical contrasting method is used to consider both the temporal and instance dimensions of the time series and captures information at different levels through maximum pooling at corresponding timestamps, enabling the model to learn fine-grained and multi-scale time-stamped representations for time series prediction tasks. Secondly, a compound consistency constraint is leveraged, which combines transformation consistency and temporal-frequency consistency, to effectively learn a universal representation of the time series, thereby ensuring its transferability across different datasets. Additionally, the framework considers both the temporal and frequency information of the time series, and uses an adaptive wavelet transform to obtain the frequency domain representation while maintaining temporal alignment, facilitating the contrast of temporal-frequency consistency. Finally, the proposed framework is evaluated through extensive experiments on time series prediction tasks and compared with existing models on four public datasets. The results show that the linear regressor trained with the representations learned by the proposed model outperforms existing time series prediction models in terms of prediction accuracy and transferability.
This paper proposes an improved residual deep reinforcement learning method for robot arm dynamic obstacle avoidance and position servo. The proposed method first simplifies the state space by constructing key points ...
详细信息
This study investigates the problem of distributed adaptive formation control of connected vehicles with actuator saturation and time-varying spacing. Firstly, optimization performance metrics are defined based on the...
详细信息
In actual motion controlsystems, saturation constraints are generally encountered, restricting the tracking performance seriously. This paper aims at devising an anti-windup scheme for motion controlsystemscontroll...
详细信息
Since the cumbersome collection process and high cost, the collected degradation of the product is basically small samples, which will affect the accuracy of reliability evaluation. It is necessary to expand the degra...
详细信息
ISBN:
(纸本)9798350321050
Since the cumbersome collection process and high cost, the collected degradation of the product is basically small samples, which will affect the accuracy of reliability evaluation. It is necessary to expand the degradation to improve the accuracy of later reliability assessment. Therefore, a degradation generation and prediction method is proposed combining the time series generator adversarial network (TimeGAN) and stochastic process. Firstly, the input degradation is expanded by the sliding window to improve the later training accuracy;Then, the construction of the generator in TimeGAN is linked with the stochastic process to make the generation data more realistic. Finally, the results of degradation prediction by the Gated Recurrent Unit (GRU) can be obtained. Two datasets and different generation methods are adopted to evaluate the effectiveness of the proposed method. The results shows that the Kullback-Leibler(KL) divergence is the smallest, and the prediction error is the smallest compared with the other methods. So, the proposed method is proved that it is valid in the degradation generation and prediction, and can be used for the further reliability assessment of the product in the industrial system.
暂无评论