Band selection is an important step in efficient processing of hyperspectral images (HSIs), which can be seen as the combination of powerful band search technique and effective evaluation criterion. The existing deep-...
详细信息
Band selection is an important step in efficient processing of hyperspectral images (HSIs), which can be seen as the combination of powerful band search technique and effective evaluation criterion. The existing deep-learning-based methods make the network parameters sparse to search the spectral bands using threshold-based functions or regularization terms. These methods may lead to an intractable optimization problem. Furthermore, these methods need to repeatedly train deep networks for evaluating candidate band subsets. In this article, we formalize hyperspectral band selection as a reinforcement learning (RL) problem. Band search is regarded as a sequential decision-making process, where each state in the search space is a feasible band subset. To evaluate each state, a semisupervised convolutional neural network (CNN), called EvaluateNet, is constructed by adding the intraclass compactness constraint of both limited labeled and sufficient unlabeled samples. A simple stochastic band sampling method is designed to train EvaluateNet, making it possible to efficiently evaluate without any fine-tuning. In RL, new reward functions are defined by taking the EvaluateNet and the penalty of repeated selection into account. Finally, advantage actorx2013;criticalgorithms are designed to explore in the state space and select the band subset according to the expected accumulated reward. The experimental results on HSI data sets demonstrate the effectiveness and efficiency of the proposed algorithms for hyperspectral band selection.
This paper develops a decentralized reinforcement learning (RL) scheme for multi-intersection adaptive traffic signal control (TSC), called "CVLight", that leverages data collected from connected vehicles (C...
详细信息
This paper develops a decentralized reinforcement learning (RL) scheme for multi-intersection adaptive traffic signal control (TSC), called "CVLight", that leverages data collected from connected vehicles (CVs). The state and reward design facilitates coordination among agents and considers travel delays collected by CVs. A novel algorithm, Asymmetric Advantage actorcritic (Asym-A2C), is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to execute optimal signal timing. Comprehensive experiments show the superiority of CVLight over state-of-the-art algorithms under a 2-by 2 synthetic road network with various traffic demand patterns and penetration rates. The learned policy is then visualized to further demonstrate the advantage of Asym-A2C. A pre train technique is applied to improve the scalability of CVLight, which significantly shortens the training time and shows the advantage in performance under a 5-by-5 road network. A case study is performed on a 2-by-2 road network located in State College, Pennsylvania, USA, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and achieve the best performance, especially under low CV penetration rates.
The task assignment for vehicles plays an important role in urban transportation system, which is the key to cost reduction and efficiency improvement. The development of information technology and the emergence of &q...
详细信息
The task assignment for vehicles plays an important role in urban transportation system, which is the key to cost reduction and efficiency improvement. The development of information technology and the emergence of "sharing economy" create a more convenient transportation mode, but also bring a greater challenge to efficient operation of urban transportation system. On the one hand, considering the complex and dynamic environment of urban transportation, an efficient method for assigning transportation tasks to idle vehicles is desired. On the other hand, to meet the users' expectations on immediate response of vehicle, the task assignment problem with dynamic arrival remains to be resolved. In this study, we propose a dynamic task assignment method for vehicles in urban transportation system based on the multi-agent reinforcement learning (RL). The transportation task assignment problem is transformed into a stochastic game process from vehicles' perspective, and then an extended actor-critic (AC) algorithm is employed to obtain the optimal strategy. Based on the proposed method, vehicles can independently make decisions in real time, thus eliminating a lot of communication cost. Compared with the methods based on first-come-first-service (FCFS) rule and classic contract net algorithm (CNA), the results show that the proposed method can obtain higher acceptance rate and profit rate in the service cycle.
Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, thes...
详细信息
Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the doubleactors-double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off policy learning algorithms based on the actor-critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors-double-critics framework through comparative experiments.
Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in ...
详细信息
Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in the online training process. In addition, the RL platform tools (e.g., Python OpenAI's Gym, Unity ML-Agents, PyBullet, DART, MoJoCo, RaiSim, Isaac, and AirSim), that are required to reduce the real-world challenges, suffer from drawbacks (e.g., the limited number of examples and applications, and difficulties in implementation of the RL algorithms, due to difficulties with the programing language). This paper presents an integrated RL framework, based on Python-Unity interaction, to demonstrate the ability to create a new RL platform tool, based on making a stable user datagram protocol (UDP) communication between the RL agent algorithm (developed using the Python programing language as a server), and the simulation environment (created using the Unity simulation software as a client). This Python-Unity integration process, increases the advantage of the overall RL platform (i.e., flexibility, scalability, and robustness), with the ability to create different environment specifications. The challenge of RL algorithms' implementation and development is also achieved. The proposed framework is validated by applying two popular deep RL algorithms (i.e., Vanilla Policy Gradient (VPG) and actor-critic (A2C)), on an elevation control challenge for a quadcopter drone. The validation results for these experimental tests, prove the innovation of the proposed framework, to be used in RL applications, because both implemented algorithms achieve high stability, by achieving convergence to the required performance through the semi-online training process.
Searching for the optimal injection molding settings for a new product usually requires much time and money. This article proposes a new method that uses reinforcement learning with prior knowledge for the optimizatio...
详细信息
Searching for the optimal injection molding settings for a new product usually requires much time and money. This article proposes a new method that uses reinforcement learning with prior knowledge for the optimization of settings. This method uses an actor-critic algorithm for the optimization of the filling phase and the holding phase. For five different injection molded products, the filling phase and holding phase were adjusted with the above-mentioned method. The learning algorithm optimized the settings for one product (pre-learning) and used this acquired knowledge (prior knowledge) to optimize the injection molding settings for a new product (post- learning). This research shows that the method is able to optimize the injection molding parameters in a reasonable time when prior knowledge is derived from a product with a different material, gate design or even geometry. On average, less than 16 injection molding cycles were needed for the algorithm to optimize the filling phase and less than 10 cycles to optimize the holding phase. The presented method can greatly facilitate the development of self-adjusting injection molding machines.
In the recent past, there has been an exponential increase in data intensive services over the communication networks. This trend would sustain in future communication networks as well, especially in the Wi-Fi network...
详细信息
ISBN:
(纸本)9781509021949
In the recent past, there has been an exponential increase in data intensive services over the communication networks. This trend would sustain in future communication networks as well, especially in the Wi-Fi networks. This could be attributed to rapid growth of business and institutional entities and the need for cellular data off-loading for which localized Wi-Fi networks are preferred due to higher offered data rate. In such networks, a major portion of energy consumption occurs at the access network entities making energy efficient operation of Wi-Fi access points (APs) extremely crucial. In this paper, an actor-critic (AC) reinforcement learning (RL) framework is designed to enable traffic based ON/OFF switching of APs in Wi-Fi network. Furthermore, previously estimated traffic statistics is exploited in future scenarios which speeds up the learning process and provide additional improvement in energy saving. The important feature of the present study is the validation of the proposed framework on real data collected from an institute's Wi-Fi network. The simulation results for 20 APs of a Wi-Fi network shows that the proposed framework can lead to around 75% saving in energy consumption as compared to the case when AP switching is not considered.
暂无评论