Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the b...
详细信息
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose a new offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state. This leverages the generalization capacity of the function approximator to estimate the value of the best available action at a given state without ever directly querying a Q-function with this unseen action. Our algorithm alternates between fitting this upper expectile value function and backing it up into a Q-function, without any explicit policy. Then, we extract the policy via advantage-weighted behavioral cloning, which also avoids querying out-of-sample actions. We dub our method implicit Q-learning (IQL). IQL is easy to implement, computationally efficient, and only requires fitting an additional critic with an asymmetric L2 loss. IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learnin
Goal-oriented requirements engineering (GORE) for Systems of Systems (SoS) includes combining individual operational systems local goals to achieve higher-level goals. GORE offers a structured approach to managing com...
详细信息
Advancing memory circuitry has heightened the performance criteria for memory in diverse chip applications. Decoders, serving as peripheral circuits to memory devices, have be a prominent area of academic interest. Th...
详细信息
Home automation systems have recently received a lot of attention as a result of rapid and advanced technological developments that have made daily life more comfortable. Many things have been automated and made avail...
详细信息
Machine learning has been adopted for efficient cooperative spectrum sensing. However, it incurs an additional security risk due to attacks leveraging adversarial machine learning to create malicious spectrum sensing ...
详细信息
ISBN:
(数字)9798350383508
ISBN:
(纸本)9798350383515
Machine learning has been adopted for efficient cooperative spectrum sensing. However, it incurs an additional security risk due to attacks leveraging adversarial machine learning to create malicious spectrum sensing values to deceive the fusion center, called adversarial spectrum attacks. In this paper, we propose an efficient framework for detecting adversarial spectrum attacks. Our design leverages the concept of the distance to the decision boundary (DDB) observed at the fusion center and compares the training and testing DDB distributions to identify adversarial spectrum attacks. We create a computationally efficient way to compute the DDB for machine learning based spectrum sensing systems. Experimental results based on realistic spectrum data show that our method, under typical settings, achieves a high detection rate of up to 99% and maintains a low false alarm rate of less than 1%. In addition, our method to compute the DDB based on spectrum data achieves 54%–64% improvements in computational efficiency over existing distance calculation methods. The proposed DDB-based detection framework offers a practical and efficient solution for identifying malicious sensing values created by adversarial spectrum attacks.
One of the prevalent, life-threatening disorders that has been on the rise in recent years is thyroid nodule. A frequent diagnostic technique for locating and identifying thyroid nodules is ultrasound imaging. However...
详细信息
One of the prevalent, life-threatening disorders that has been on the rise in recent years is thyroid nodule. A frequent diagnostic technique for locating and identifying thyroid nodules is ultrasound imaging. However, it takes time and presents difficulties for the specialists to evaluate all of the slide images. Automated, reliable, and objective methods are required for accurately evaluating ultrasound images. Recent developments in deep learning have completely changed several facets of image analysis and computer-aided diagnostic (CAD) techniques that deal with the issue of identifying thyroid nodules. We reviewed the literature on the potential, constraints, and present deep learning applications for thyroid cancer detection and discussed the study’s goals. We provided an overview of latest developments in the deep learning techniques for thyroid cancer diagnosis and addressed some of the difficulties and practical issues that can restrict the development of deep learning and its incorporation into healthcare setting. Copyright VC 2024 by ASME.
A novel method called cooperative communication allows devices with a single antenna to share their antennae and assist other nodes in signal relaying. Thus, enhanced reception reliability, lower power usage, and incr...
详细信息
The Nesterov accelerated dynamical approach serves as an essential tool for addressing convex optimization problems with accelerated convergence *** previous studies in this field have primarily concentrated on uncons...
详细信息
The Nesterov accelerated dynamical approach serves as an essential tool for addressing convex optimization problems with accelerated convergence *** previous studies in this field have primarily concentrated on unconstrained smooth con-vex optimization *** this paper,on the basis of primal-dual dynamical approach,Nesterov accelerated dynamical approach,projection operator and directional gradient,we present two accelerated primal-dual projection neurodynamic approaches with time scaling to address convex optimization problems with smooth and nonsmooth objective functions subject to linear and set constraints,which consist of a second-order ODE(ordinary differential equation)or differential conclusion system for the primal variables and a first-order ODE for the dual *** satisfying specific conditions for time scaling,we demonstrate that the proposed approaches have a faster conver-gence *** only requires assuming convexity of the objective *** validate the effectiveness of our proposed two accel-erated primal-dual projection neurodynamic approaches through numerical experiments.
作者:
Benaissa, RabieMansouri, SmailOuledali, OmarUniversity of Adrar
Faculty of Sciences and Technology Department of Electrical Engineering Laboratory for Sustainable Development and Computer Science 01000 Algeria University of Adrar
Faculty of Sciences and Technology Department of Hydrocarbons and Renewable Energies Laboratory for Energy Environment and Information Systems 01000 Algeria
This article supplies a proposed approach Neuro Fuzzy Controller (NFC)-Adaptive Backstepping Controller (ABC)-Space Vector Modulation (SVM) for a five-level NPC inverter-Double Stator Interior Permanent Magnet Synchro...
详细信息
暂无评论