Multi-agent Reinforcement learning (MARL) is a machine learning method that solves problems by using multiple learning agents in a data-driven manner. Because of the advantage of utilizing multiple agents simultaneous...
详细信息
Multi-agent Reinforcement learning (MARL) is a machine learning method that solves problems by using multiple learning agents in a data-driven manner. Because of the advantage of utilizing multiple agents simultaneously, MARL has become an efficient solution to large-scale problems in a wide range of fields. However, as with general single-agent reinforcement learning, MARL requires trial and error to acquire the appropriate policies for each agent in the learning process. Therefore, how to guarantee performance and constraint satisfaction in MARL is a critical issue for application to real-world problems. In this study, we propose an Information-sharing Constrained Policy Optimization (IsCPO) method for MARL that guarantees constraint satisfaction during learning. In detail, IsCPO sequentially updates the policies of multiple agents in random order while sharing information of the surrogate costs and KL-divergence for evaluating the current and updated policies to the next agent. In addition, if there are no candidates of policies to be updated in accordance with the shared information, IsCPO skips updating the policies of the rest of the agents until the next iteration. As a result, IsCPO makes it possible to acquire the individual suboptimal policies of agents, satisfying constraints on global costs related to the state of the environment and the actions from multiple agents. We also introduce a practical algorithm for IsCPO that simplifies its implementation by adopting several mathematical approximations. Finally, we show the validity and effectiveness through simulation results on a multiple cart-pole problem and base station sleep control problem in a mobile network.
Purpose The purpose of this paper is to conduct a comprehensive review of the noteworthy contributions made in the area of the Feedforward neural network (FNN) to improve its generalization performance and convergence...
详细信息
Purpose The purpose of this paper is to conduct a comprehensive review of the noteworthy contributions made in the area of the Feedforward neural network (FNN) to improve its generalization performance and convergence rate (learning speed);to identify new research directions that will help researchers to design new, simple and efficient algorithms and users to implement optimal designed FNNs for solving complex problems;and to explore the wide applications of the reviewed FNN algorithms in solving real-world management, engineering and health sciences problems and demonstrate the advantages of these algorithms in enhancing decision making for practical operations. Design/methodology/approach The FNN has gained much popularity during the last three decades. Therefore, the authors have focused on algorithms proposed during the last three decades. The selected databases were searched with popular keywords: "generalization performance," "learning rate," "overfitting" and "fixed and cascade architecture." Combinations of the keywords were also used to get more relevant results. Duplicated articles in the databases, non-English language, and matched keywords but out of scope, were discarded. Findings The authors studied a total of 80 articles and classified them into six categories according to the nature of the algorithms proposed in these articles which aimed at improving the generalization performance and convergence rate of FNNs. To review and discuss all the six categories would result in the paper being too long. Therefore, the authors further divided the six categories into two parts (i.e. Part I and Part II). The current paper, Part I, investigates two categories that focus on learning algorithms (i.e. gradient learning algorithms for network training and gradient-free learning algorithms). Furthermore, the remaining four categories which mainly explore optimization techniques are reviewed in Part II (i.e. optimization algorithms for learning rate, bias and varian
This article deals with the development of four modified radial basis function neural network (RBFNN) models. The corresponding learning algorithms associated with the updating of internal parameters of the models are...
详细信息
This article deals with the development of four modified radial basis function neural network (RBFNN) models. The corresponding learning algorithms associated with the updating of internal parameters of the models are derived. The conventional inputs are used in the first and second modified RBFNN models (models 3 and 4) whereas exponential nonlinear inputs are used in the fifth and sixth RBFNN models to provide additional nonlinearity for achieving a better solution of nonlinear classification, and direct and inverse modeling problems. To assess and compare the performance potentiality of the proposed four new RBFNN models, one classification problem, one direct modeling problem, and one inverse modeling problem are solved through computer simulation-based experiments. For comparison and to assign the performance rank of each of the four modified RBFNN models, two conventional and commonly used RBFNN models (models 1 and 2) are also simulated. To access the performance of different models during the training phase of Examples 1 and 2, the root mean-square error (RMSE) value, mean absolute deviation (MAD), and the number of iterations required to achieve convergence are obtained. For the third example, only the first two performance measures are found. During the testing or validation phase, the output responses of the different models of Example 2 are compared with the desired response analysis. For Example 3, the bit-error rate (BER) plots are compared. The observation of all the results demonstrates consistent ranks of all models in the case of all three examples. It is, in general, found that the ranks of the models 1-6 are 6, 4, 3, 2, 5, and 1, respectively. In essence, in terms of all performance measures, model M-6 with an exponential version of inputs with weights on both layers occupies the first position whereas model M-4 with conventional inputs as the second position.
In this work, a control operation of a 1.5 MW offshore wind turbine (WT) formaximum power point tracking (MPPT) whenwind speed is below-rated, is studied. The implemented controller is designed using the general Direc...
详细信息
In this work, a control operation of a 1.5 MW offshore wind turbine (WT) formaximum power point tracking (MPPT) whenwind speed is below-rated, is studied. The implemented controller is designed using the general Direct Speed Control (DSC) scheme in which artificial neural networks (ANN) are incorporated to close the control loop. The neural controller acts in an unsupervised mode updating its weights with the incorporation of a learning algorithm. The optimal configuration parameters of the controller are determined by genetic algorithms. With this intelligent control strategy, the generator speed is regulated by varying the electromagnetic torque while adapting to the external phenomena in real time. Then, the output power, through the power coefficient (Cp), reaches the maximum wind power generation in that region. The offshore WT model is subjected to external loads due to wind and waves, which increase the system complexity and produce tower vibrations, negatively impacting the control efficiency. Despite that, it is shown that the proposed controller is able to operate with satisfactory results in terms of power generation and even reducing vibration, and it has been compared to the OpenFAST embedded torque control for the sameWT providing better results.
A multi-agent reinforcement learning vibration controller is designed for active vibration suppression of a movable double piezoelectric flexible beam coupling system, and the motion trajectory is optimized to minimiz...
详细信息
A multi-agent reinforcement learning vibration controller is designed for active vibration suppression of a movable double piezoelectric flexible beam coupling system, and the motion trajectory is optimized to minimize vibration excitation during motion and residual vibration after motion. The finite element method is used to model the system dynamics, then, the actual model parameters are identified by combining wavelet and intelligent optimization algorithm. The corrected piezoelectric driving model is used to train the counterfactual multi-agent reinforcement learning (COMARL) algorithm, and an excellent nonlinear controller for vibration control of piezoelectric actuators is obtained. The motion trajectory of the double flexible beam coupling system is designed by using the corrected motor-driven model. The optimal vibration suppression trajectory is obtained by using tabu search algorithm. The simulation and experimental results show that the optimized trajectory greatly reduces the vibration excitation. The controller trained by the COMARL algorithm fully considers the influence of either beam in the system, and infers the contribution of piezoelectric actuators to the completion of the overall task through counterfactual thinking. The control effect is better than that of PD control, especially the small amplitude vibration suppression. The effectiveness of the COMARL controller is further verified by simultaneous piezoelectric control during trajectory motion. Vibrations during translational motion and at the end of motion are suppressed quickly.
Adaptive metacognitive scaffolding is developed to provide learning assistance on an as-needed basis;thus, advances the effectiveness of computer-based learning systems. Metacognitive scaffoldings have been developed ...
详细信息
Adaptive metacognitive scaffolding is developed to provide learning assistance on an as-needed basis;thus, advances the effectiveness of computer-based learning systems. Metacognitive scaffoldings have been developed for some science subjects;however, not for algorithm-learning. The learning algorithm is different from learning science as it is more oriented to problem-solving;therefore, this study is aimed to describe the modelling, development, and evaluation of the adaptive metacognitive scaffolding which is dedicated for encouraging algorithm-learning. In addition, the authors present a new approach for learner modelling to find students' metacognitive state. Adaptivity of the scaffolding is based on the learner modelling. To evaluate the effectiveness of the developed system, it is deployed in a real algorithm-learning classroom of 38 students. The class is randomly divided into two groups: experiment and control. Two parameters are measured from both groups, i.e. academic success and academic satisfaction. Non-parametric statistical test, i.e. Mann-Whitney U-test (significance level 0.01) rejects the null hypothesis (U-value = 86.5 and U-critical = 101). This result verifies that the academic success of the experiment group is significantly higher than that of the control group. In addition, an academic satisfaction survey shows that adaptive scaffolding is valid in assisting students while learning with the system.
In this paper we propose a new image classification technique. According to this note that most research focuses on extraction of features in the frequency domain, location, and reduction of feature dimensions, in thi...
详细信息
In this paper we propose a new image classification technique. According to this note that most research focuses on extraction of features in the frequency domain, location, and reduction of feature dimensions, in this research we focused on learning step in image classification. The main aim is to use the heuristic methods to increase the function of the estimator of the learning algorithm and continue to achieve the desired state, as well as categorization without user interference and automatically performed by the model produced from the above steps. So, in this paper, a new learning approach based on the Salp Swarm algorithm was proposed that was implemented and evaluated on learning algorithm Decision Tree, K-Nearest Neighbors and Naive Bayes. The results demonstrate the improvement of the performance of learning algorithms in all the achieved criteria by using the SSA algorithm in comparison with traditional learning algorithms. In the accuracy, sensitivity, classification error and F1 criterion, the best performance of the proposed model is using the Decision Tree learning method with values of 99.17%, 100%, 0.83% and 95.65% respectively. In the specificity and precision criterion, the best performance of the proposed model is based on K-Nearest Neighbors learning method with values of 100%.
In the leather industry, identifying species of leather holds a significant step toward consistent global leather trade. Intertwining image processing and a learning algorithm with leather science can enhance the pred...
详细信息
In the leather industry, identifying species of leather holds a significant step toward consistent global leather trade. Intertwining image processing and a learning algorithm with leather science can enhance the predictability of leather species. Hence, this paper aims to learn the pore-pattern variability between each species from digital microscopic leather images. These images undergo image pre-processing to generate leather images with highlighted pores, less susceptible to noise. This work also proposes an Entropy-based Otsu's thresholding with Component-area-histogram Analysis (EOCA) to achieve an adequate hair-pore segmentation, irrespective of any species. Goodness and discrepancy measures validate the generosity of the proposed EOCA method. Morphological, geometrical, and statistical features estimate the pattern-variability of each species. To ascertain the discriminatory behavior of these features, this work performs the feature classification using KNN, NB, DT, SVM, and MLP classifiers. The classification accuracy signifies the efficiency of the pre-processing and the proposed EOCA method in estimating species-definite features. The performance comparison determines MLP with 98.75% accuracy as an appropriate leather species learning model. Thus, the present work contributes to automatic leather species identification by learning and interpreting species-definite features. It also lays the design and development of a human-machine interactive platform to revolutionize the leather trade.
Recently, different structures of artificial neural networks (shortly ANNs) have been proposed for the modeling and simulation of many real-world complex phenomena. The current research is devoted to the numerical stu...
详细信息
Recently, different structures of artificial neural networks (shortly ANNs) have been proposed for the modeling and simulation of many real-world complex phenomena. The current research is devoted to the numerical study of an ordinary linear fractional-order integro-differential equation of Volterra type. By substituting the unknown function with a suitable three-layered feed-forward neural architecture, this initial value fractional problem is converted approximately to a system of nonlinear minimization equations. Due to the complexity of the achieved problem, the back-propagation algorithm is employed by making small adjustments in the learning process. In other words, an iterative optimization algorithm based on the gradient descent method is constructed to approximate the solution of the origin fractional problem. Moreover, some examples consist of computer simulations are provided to demonstrate the accuracy and ability of the indicated iterative technique. The obtained numerical results show the efficiency and capability of the ANNs approach in comparison with traditional methods.
In this paper, we propose a method for generating adversarial examples in the text domain using GPT-2, a state-of-the-art language model. Our method employs an iterative algorithm to produce perturbations to input tex...
详细信息
暂无评论