To traditional anti-jamming decision algorithm that cannot meet the security needs of smart city development, this paper proposes a communication security anti-interference decision algorithm using deep learning in an...
详细信息
To traditional anti-jamming decision algorithm that cannot meet the security needs of smart city development, this paper proposes a communication security anti-interference decision algorithm using deep learning in an intelligent industrial IoT environment. Firstly, an interactive system model of cognitive users and disruptors with intelligent perception function is constructed. Besides, the interference intensity and channel gain are comprehensively analyzed to design the optimization goal to maximize network capacity. Then, by modeling the interaction between cognitive environment and decision engine as the interaction between environment and agent in deep reinforcement learning, the q-learning algorithm integrating reinforcement learning is used to explore the maximum action reward feedback to cognitive decision engine, so as to intelligently obtain the effective interference parameters of communication state. Finally, the proposed algorithm is experimentally demonstrated based on MATLAB simulation platform. The results show that when the number of links is 300, the network capacity of proposed algorithm is about 960 bit . s(-1) . Hz(-1), and the cumulative average reward value reaches 0.59, which is better than the comparison algorithm, and realizes high reliable autonomous decision-making.
In this study, the quality of Service (qoS) needed to support service continuity in heterogeneous networks is achieved by a Distributed Multi-Agent Scheme (DMAS) based on cooperation concepts and an awareness algorith...
详细信息
In this study, the quality of Service (qoS) needed to support service continuity in heterogeneous networks is achieved by a Distributed Multi-Agent Scheme (DMAS) based on cooperation concepts and an awareness algorithm. A set of problem solving agents autonomously process local tasks and cooperatively interoperate via an in-cloud blackboard system to provide qoS and mobility information. A q-learning awareness algorithm calculates the exceptive rewards of a handoff to all access networks. These rewards are then used by problem solving agents to determine what actions must be performed. Agents located in the integrated IMS-4G-Cloud networks handle service continuity by using a handoff mechanism. Through operations and cooperation among active agents, these phases select a policy for predictive and anticipated IF Multimedia Subsystem (IMS) handoff management. Compared with conventional IMS handoff management, the proposed DMAS scheme achieves shorter handoff delay and better qoS for real-time service applications. (C) 2014 Elsevier Ltd. All rights reserved.
The proportional integral and derivative (PID) controller is extensively applied in many applications. However, three parameters must be properly adjusted to ensure effective performance of the control system: the pro...
详细信息
The proportional integral and derivative (PID) controller is extensively applied in many applications. However, three parameters must be properly adjusted to ensure effective performance of the control system: the proportional gain and derivative gain (inline-formula). Therefore, the aim of this paper is to optimize and improve the stability, convergence and performance in autotuning the PID parameter by using a deterministic q-SLP algorithm. The proposed method is a combination of the swarm learning process (SLP) algorithm and q-learning algorithm. The q-learning algorithm is applied to optimize the weight updating of the SLP algorithm based on the new deterministic rule and closed-loop stabilization of the learning rate. To validate the global optimization of the deterministic rule, it is proven based on the Bellman equation, and the stability of the learning process is proven with respect to the Lyapunov stability theorem. Additionally, to demonstrate the superiority of the performance and convergence in autotuning the PID parameter, simulation results of the proposed method are compared with those based on the central position control (CPC) system using the traditional SLP algorithm, the whale optimization algorithm (WOA) and improved particle swarm optimization (IPSO). The comparison shows that the proposed method can provide results superior to those of the other algorithms with respect to both performance indices and convergence.
With increasing adoption and presence of Web services, service composition becomes an effective way to construct software applications. Composite services need to satisfy both the functional and the non-functional req...
详细信息
With increasing adoption and presence of Web services, service composition becomes an effective way to construct software applications. Composite services need to satisfy both the functional and the non-functional requirements. Traditional methods usually assume that the quality of service (qoS) and the behaviors of services are deterministic, and they execute the composite service after all the component services are selected. It is difficult to guarantee the satisfaction of user constraints and the successful execution of the composite service. This paper models the constraint-satisfied service composition (CSSC) problem as a Markov decision process (MDP), namely CSSC-MDP, and designs a q-learning algorithm to solve the model. CSSC-MDP takes the uncertainty of qoS and service behavior into account, and selects a component service after the execution of previous services. Thus, CSSC-MDP can select the globally optimal service based on the constraints which need the following services to satisfy. In the case of selected service failure, CSSC-MDP can timely provide the optimal alternative service. Simulation experiments show that the proposed method can successfully solve the CSSC problem of different sizes. Comparing with three representative methods, CSSC-MDP has obvious advantages, especially in terms of the success rate of service composition.
We present an effective hybrid metaheuristic of integrating reinforcement learning with a tabu-search (RLTS) algorithm for solving the max- mean dispersion problem. The innovative element is to design using a knowledg...
详细信息
We present an effective hybrid metaheuristic of integrating reinforcement learning with a tabu-search (RLTS) algorithm for solving the max- mean dispersion problem. The innovative element is to design using a knowledge strategy from the q-learning mechanism to locate promising regions when the tabu search is stuck in a local optimum. Computational experiments on extensive benchmarks show that the RLTS performs much better than stateof-the-art algorithms in the literature. From a total of 100 benchmark instances, in 60 of them, which ranged from 500 to 1,000, our proposed algorithm matched the currently best lower bounds for all instances. For the remaining 40 instances, the algorithm matched or outperformed. Furthermore, additional support was applied to present the effectiveness of the combined RL technique. The analysis sheds light on the effectiveness of the proposed RLTS algorithm.
In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic ad...
详细信息
In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle learning through experimentation - that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learningalgorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.
In this study, the authors propose a learning-based approach to improve the security of the authors' considered communication system in a dynamic environment, where a source transmits information to a legitimate r...
详细信息
In this study, the authors propose a learning-based approach to improve the security of the authors' considered communication system in a dynamic environment, where a source transmits information to a legitimate receiver in the presence of an active eavesdropper. Additionally, they assume that the source has to harvest energy from the environment to support its communication. Due to the dynamic of the environment, both the harvested energy and the channel vary over time, requiring a dynamic transmission strategy that follows the changes. In order to improve the security performance, they first analyse how to select the optimal transmission parameters in hindsight, and then they propose to combine the q-learning algorithm and the expert advice method to maximise the cumulative reward in the dynamic environment. They also introduce an improved learning-based approach, which accelerates the convergence of their approach. The simulation results show that their proposed learning-based approach helps the legitimate nodes learn a beneficial transmission strategy to obtain a larger cumulative reward.
We propose the hierarchical behavior suggestion system and recovery mechanism for the smart home management platform, including location layer, action layer, and home appliance layer. The smart home management system ...
详细信息
We propose the hierarchical behavior suggestion system and recovery mechanism for the smart home management platform, including location layer, action layer, and home appliance layer. The smart home management system uses the hierarchical structure to take regional management action and home appliance management action. This study also provides a hierarchical human behavior suggestion algorithm (HHBSA), which suggests the behavior pattern. HHBSA includes a location-learning suggestion algorithm (LISA) and an action-behavior suggestion algorithm (ABSA). LISA suggests the user's location with the concepts of q-learning and fuzzy-state q-learning (FSqL). ABSA provides advices on regional behaviors according to the suggested regional sequence updated by users' location. The home appliances included in the behaviors can be switched on in advance when the behaviors have been suggested. A hierarchical recovery mechanism may be used to correct the errors occurring when starting the home appliances. The home appliances can be re-started when errors occur if the action layer is set as a recovery point that can be changed according to the usage sequence. A dynamic recovery point makes it possible to unlimitedly add behaviors to the system, and to maintain the efficiency of a recovery mechanism. (C) 2015 Elsevier B.V. All rights reserved.
In the machining of parts, tool paths for complex cavity milling often have different generation options, as opposed to simple machining features. The different tool path generation options influence the machining tim...
详细信息
In the machining of parts, tool paths for complex cavity milling often have different generation options, as opposed to simple machining features. The different tool path generation options influence the machining time and cost of the part during the machining process. Decision makers prefer tool path solutions that have fewer blanking lengths, which means that the machining process is more efficient. Therefore, in order to reduce costs and increase efficiency, it is necessary to carefully design the tool path generation for the features to be machined on the part, especially for complex cavity milling features. However, solutions to the problem of optimal design of tool paths for complex cavity milling features have not been well developed in current research work. In this paper, we present a systematic solution for complex cavity milling tool path generation based on reinforcement learning. First, a grid converter is executed for converting the 3D geometry of the cavity milling feature into a matrix of planar grid points recognisable by the program, set according to the cutting parameters. Afterwards, the tool path generation process is refined and modelled as a Markov decision process. Ultimately, a tool path generation solution combining the A* algorithm with the q-learning algorithm is executed. The agent iterates through trial and error to construct an optimal tool path for a given cavity milling task. Three case experiments demonstrate the feasibility of the proposed approach. The superiority of the reinforcement learning-based approach in terms of solution speed and solution quality is further demonstrated by comparing the proposed approach with the evolutionary computational techniques currently popular in research for solving tool path optimisation design problems.
As the density of integrated circuits continues to increase, the possibility that real-time systems suffer from soft and hard errors rises significantly, resulting in a degraded availability of system. In this article...
详细信息
As the density of integrated circuits continues to increase, the possibility that real-time systems suffer from soft and hard errors rises significantly, resulting in a degraded availability of system. In this article, we investigate the dynamic modeling of cross-layer soft error rate based on the Back Propagation (BP) neural network, and propose optimization strategies for system availability based on Cross Entropy (CE) and q-learning algorithms. Specifically, the BP neural network is trained using cross-layer simulation data obtained from SPICE simulation while the optimization for system availability is achieved by judiciously selecting an optimal supply voltage for processors under timing constraints. Simulation results show that the CE-based method can improve system availability by up to 32 percent compared to state-of-the-art methods, and the q-learning-based algorithm can further enhance system availability by up to 20 percent compared to the proposed CE-based method.
暂无评论