检索结果-内蒙古大学图书馆

27th Signal Processing and Communications Applications Conference (SIU)

作者： Calisir, Sinan Pehlivanoglu, Meltem Kurt Kocaeli Univ Bilgisayar Muhendisligi Bolumu Kocaeli Turkey

ISBN: (纸本)9781728119045

This paper aims to provide a comprehensive survey of the reinforcement learning algorithms given in the literature. Especially model-free reinforcement learning algorithms are given in details and the differences of these algorithms are handled. Finally, some open problems in reinforcement learning are presented for future researches.

关键词： reinforcement learning algorithms deep reinforcement learning learning artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

Research on big data anomaly mining method for power grid operation and maintenance based on reinforcement learning algorithm 9

Research on big data anomaly mining method for power grid op...

引用

9th International Forum on Electrical Engineering and Automation (IFEEA)

作者： Wen, Xing CSG EHV Power Transmiss Co Guangzhou Guangdong Peoples R China

ISBN: (纸本)9781665464215

As the scale of development of power grids continues to expand, the issue of their safe operation has received much attention. The problem of low accuracy in the face of network attacks exists in the big data anomaly mining method of grid operation and maintenance. A big data anomaly mining method of grid operation and maintenance based on reinforcement learning algorithm is designed. The method is based on a reinforcement learning algorithm, which is used to obtain early warning data, develop an offline analysis function, calculate an increase matrix of key nodes, construct a big data anomaly identification model and optimise the mining process. Experimental results: The accuracy of the designed mining method is 11.92%, 14.57% and 13.02% higher than the average of the other three methods.

关键词： reinforcement learning algorithms Grid operations and maintenance Big data anomaly mining Grid security Data objects

来源：评论

学校读者我要写书评

暂无评论

Adaptive Tabu Dropout for Regularization of Deep Neural Networks 29th

Adaptive Tabu Dropout for Regularization of Deep Neural Netw...

引用

29th International Conference on Neural Information Processing

作者： Hasan, Md Tarek Akter, Ari Fa Shamael, Mohammad Nazmush Hossain, Md Al Emran Billah, H. M. Mutasim Islam, Sumayra Shatabda, Swakkhar United Int Univ Dept Comp Sci & Engn Plot 2Madani Ave Dhaka 1212 Badda Bangladesh

ISBN: (纸本)9783031301049;9783031301056

Dropout is an effective strategy for the regularization of deep neural networks. Applying tabu to the units that have been dropped in the recent epoch and retaining them for training ensures diversification in dropout. In this paper, we improve the Tabu Dropout mechanism for training deep neural networks in two ways. Firstly, we propose to use tabu tenure, or the number of epochs a particular unit will not be dropped. Different tabu tenures provide diversification to boost the training of deep neural networks based on the search landscape. Secondly, we propose an adaptive tabu algorithm that automatically selects the tabu tenure based on the training performances through epochs. On several standard benchmark datasets, the experimental results show that the adaptive tabu dropout and tabu tenure dropout diversify and perform significantly better compared to the standard dropout and basic tabu dropout mechanisms.

关键词： Online learning & Bandits Deep Neural Network algorithms reinforcement learning algorithms Heuristic Search Local Search

来源：评论

学校读者我要写书评

暂无评论

Essays on Return Insurance and Antitrust Issues

Essays on Return Insurance and Antitrust Issues

引用

作者： Vo, Phuong Minh University of California Irvine

学位级别：Ph.D., Doctor of Philosophy

Chapter 1 introduces a continuous-time monopoly model that considers a return policy allowing consumers to return purchased products within a specified period. The model shows that an easy return policy, allowing for no-questions-asked returns, reduces consumer surplus compared to a stricter policy that only refunds under specific conditions. This decrease in consumer surplus happens when consumers are not highly price-sensitive. The study also explores how product quality affects the return policy and finds that lower quality products result in longer return periods. Additionally, lower quality products can lead to higher market prices if the value of the defective item is high enough. In Chapter 2, a duopoly model is developed to study competition between an online seller and a local store seller in the presence of a return policy. The model finds that offering returns after sales increases the sellers’ market power. The local store targets consumers with lower utility by offering a shorter return period and a lower price. The study also shows that an increase in the information cost at the local store increases market price dispersion and gives the online store more market share. Chapter 3 uses economic theory and experiments with AI-based pricing algorithms to analyze the impact of consumer search friction on collusion, market prices, and consumer welfare. The chapter develops an oligopoly model where consumers search sequentially for the best product with advertised prices. The study finds that collusion is easier to sustain with lower search costs. However, increasing search costs can reduce the collusive price, but this does not increase consumer surplus if the collusion sustains. The experiments show that simple reinforcement learning algorithms (Q-learning) can adopt a trigger-price strategy to keep prices above the competitive level in a frictional market.

关键词： Insurance Antitrust issues Return policy Consumers Market prices AI-based pricing algorithms reinforcement learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Application of Robotic Arm Path Planning Based on TQC Algorithm 6

Application of Robotic Arm Path Planning Based on TQC Algori...

引用

6th IEEE International Conference on Automation, Electronics and Electrical Engineering, AUTEEE 2023

作者： Gu, Jiahui Shanghai Polytechnic University School of Computer and Information Engineering Shanghai China

ISBN: (纸本)9798350305623

For the slow training of the Panda robotic arm grasping and placing task in a third-party environment in the Gym simulation environment, it is proposed to use the TQC algorithm for training. Compared with DDPG algorithm and SAC algorithm, the training speed is significantly improved. This algorithm has high performance when faced with a rudimentary reward function, in this experiment the reward function is only set to give 0 for grasping an object and gives a negative reward otherwise. This rudimentary reward function makes all actions equally negatively rewarded, and when the strategy is updated, there is no information about which action is better, and therefore the strategy is not improved, thus making it more difficult to explore the rewards for success. When it comes to solving the Gym-Panda robotic arm grasping task using the TQC (Truncated Quantile Critics) algorithm, it is first necessary to understand the TQC algorithm as well as the fundamentals of the Panda-Gym robotic arm simulation environment. After understanding the algorithms, the TQC, DDPG, and SAC algorithms are used in the same environment to train on the same task environment, and the superiority of the TQC algorithm is demonstrated by comparing the data curves. © 2023 IEEE.

关键词： gym panda robotic arm reinforcement learning algorithms simulation TQC

来源：评论

学校读者我要写书评

暂无评论

Development and Design of an Intelligent Financial Asset Management System Based on Big Data Analysis and Kubernetes

引用

Procedia Computer Science 2024年 243卷 482-489页

作者： Yicheng Peng Mergers & Acquisitions Practice West Monroe Partners New York 10019 NY USA

The rise of deep learning in the financial field has led to the integration of artificial intelligence and investment, providing users with intelligent investment decisions. However, the data volume of financial market continues to expand, traditional data processing methods can no longer meet the needs of efficiency and accuracy. This article focuses on deep reinforcement learning algorithms and delves into key issues such as stock price prediction, investment portfolios, and algorithmic trading. By comparing and analyzing the experimental results, not only was the performance of the model evaluated, but also the actual effect of the algorithm output was deeply explored. At the same time, drawing on Kubernetes container orchestration and microservice technology, a high concurrency and high-performance distributed financial data analysis system was constructed. This system not only meets the needs of users for real-time data analysis and deep learning, but also provides more reasonable investment suggestions for users. The contribution of this article lies in introducing deep reinforcement learning to solve nonlinear data problems in the financial field, proposing intelligent asset management methods, and designing a feasible intelligent financial asset management system, providing new ideas and practical experience for the further development of financial data analysis platforms.

关键词： Deep learning Financial data analysis reinforcement learning algorithms Kubernetes

来源：评论

学校读者我要写书评

暂无评论

On the Optimization of User Association and Resource Allocation in HetNets With mm-Wave Base Stations

引用

IEEE SYSTEMS JOURNAL 2020年第3期14卷 3957-3967页

作者： Chaieb, Cirine Mlika, Zoubeir Abdelkefi, Fatma Ajib, Wessam Univ Quebec Montreal Dept Comp Sci Montreal PQ H3C 3P8 Canada Higher Sch Commun Tunis Dept Appl Math Signals & Commun El Ghazala 2083 Ariana Tunisia

This article investigates the problem of joint user association and resource allocation, defined by the number of allocated time-slots, in hybrid heterogeneous networks with the coexistence of sub-6-GHz base stations and millimeter wave (mm-Wave) base stations. To do so, we formulate a joint optimization problem to improve the efficiency of resource utilization by maximizing the number of associated users and minimizing the number of allocated time-slots. The optimization problem is formulated as a binary integer linear program and is proved to be NP-hard. Accordingly, we propose two efficient heuristic algorithms to solve it. The first one is centralized and relies on complete information, whereas the second one is distributed and is based on a reinforcement learning approach. The proposed distributed learning algorithm aims to find the best association for each user based on its past experience, automatically and independently from others. Simulation results show that the performances of both proposed algorithms are close-to-optimal with an important reduction in computational complexity.

关键词： Resource management Interference Signal to noise ratio Optimization Heuristic algorithms reinforcement learning Base stations Hybrid HetNets millimeter wave communications reinforcement learning algorithms user association

来源：评论

学校读者我要写书评

暂无评论

RETRACTED: Research on breakthrough and innovation of UAV mission planning method based on cloud computing-based reinforcement learning algorithm (Retracted Article)

引用

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019年第3期37卷 3285-3292页

作者： Liu, Rong Liang, Jin Alkhambashi, Majid Nanjing Univ Aeronaut & Astronaut UAV Res Inst Middle & Small Size UAV Adv Tech Key Lab Minist Ind & Informat Technol Nanjing Jiangsu Peoples R China FACRI Sci & Technol Aircraft Control Lab Xian Shanxi Peoples R China Al Zahra Coll Women Dept Informat Technol Muscat Oman

The UAV system has evolved in the direction of intelligence and autonomy. Mission planning is an important part of autonomous drone control. The issue of route planning and task assignment in drone mission planning is studied. For the drone path planning problem in three-dimensional static threat environment, two improved ant colony algorithms are proposed, and these prior knowledges are constructed as multiple heuristic information of ants, guiding the ant's path search, and verifying the global convergence of the algorithm. The fuzzy inference system is used to dynamically adjust the parameters of the RRT algorithm according to the real-time information of the task environment and the growth status of the RRT random tree. The experimental results show that the two improved algorithms can obtain better planning results than the single artificial potential field method and ant colony algorithm, effectively shorten the route planning time, improve the planning accuracy, and obtain the optimal flight path.

关键词： cloud computing reinforcement learning algorithms UAVs mission planning

来源：评论

学校读者我要写书评

暂无评论

Energy-Efficient Power Control for Multiple-Relay Cooperative Networks Using Q-learning

引用

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS 2015年第3期14卷 1567-1580页

作者： Shams, Farshad Bacci, Giacomo Luise, Marco Inst Markets Technol IMT Inst Adv Studies Dept Comp Sci & Engn I-55100 Lucca Italy Univ Pisa Dipartimento Ingn Informaz I-56121 Pisa Italy Univ Pisa Dipartimento Ingn Informaz I-56122 Pisa Italy

In this paper, we investigate the power control problem in a cooperative network with multiple wireless transmitters, multiple amplify-and-forward relays, and one destination. The relay communication can be either full duplex or half-duplex, and all source nodes interfere with each other at every intermediate relay node, and all active nodes (transmitters and relay nodes) interfere with each other at the base station. A game-theory-based power control algorithm is devised to allocate the powers among all active nodes. The source nodes aim at maximizing their energy efficiency (in bits per Joule per Hertz), whereas the relays aim at maximizing the network sum rate. We show that the proposed game admits multiple pure/mixed-strategy Nash equilibrium points. A Q-learning-based algorithm is then formulated to let the active players converge to the best Nash equilibrium point that combines good performance in terms of both energy efficiency and overall data rate. Numerical results show that the full-duplex scheme outperforms half-duplex configuration, Nash bargaining solution, the max-min fairness, and the max-rate optimization schemes in terms of energy efficiency, and outperforms the half-duplex mode, Nash bargaining system, and the max-min fairness scheme in terms of network sum rate.

关键词： Energy efficiency reinforcement learning algorithms relay-assisted communications full-duplex communications mixed-strategy Nash equilibria power control

来源：评论

学校读者我要写书评

暂无评论

Option and Constraint Generation using Work Domain Analysis

Option and Constraint Generation using Work Domain Analysis

引用

IEEE International Conference on Systems, Man, and Cybernetics (SMC)

作者： Tokadli, Gueliz Feigh, Karen M. Georgia Inst Technol Sch Aerosp Engn Atlanta GA 30332 USA

ISBN: (纸本)9781479938407

In this paper we investigate the use of Work Domain Analysis (WDA), a technique from the field of cognitive engineering, to inform the creation of options and constraints for reinforcement learning (RL) algorithms. The micro-world of Pac-Man, a classic arcade game, is used as a tractable and representative work domain. WDA was conducted on individuals familiar with Pac-Man and an Abstraction Hierarchy (AH), a means-ends representation of their understanding of the game, was created for each individual. The abstraction hierarchies for best performing and worst performing individuals were then combined to illustrate the differences between the different groups. Several differences between the two groups were found, and included the use of defense as well as offensive strategies by high performers versus only defense by poor performers, context sensitivity and additional goals and more sophisticated constraints by high performers. The differences were translated into an options and constraint paradigm suitable for incorporation into RL algorithms.

关键词： cognition learning (artificial intelligence) Pac-Man microworld RL algorithms WDA abstraction hierarchy cognitive engineering constraint generation context sensitivity means-ends representation option generation reinforcement learning algorithms work domain analysis Abstracts Algorithm design and analysis Games Interviews learning (artificial intelligence) Machine learning algorithms Terminology

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：