检索结果-内蒙古大学图书馆

Learning to select branching rules in the DPLL procedure for satisfiability

Electronic Notes in Discrete Mathematics 2001年 9卷 344-359页

作者： Lagoudakis, Michail G. Littman, Michael L. Department of Computer Science Duke University Durham NC 27708 United States Shannon Laboratory AT and T Labs. Research Florham Park NJ 07932 United States

The DPLL procedure is the most popular complete satisfiability (SAT) solver. While its worst case complexity is exponential, the actual running time is greatly affected by the ordering of branch variables during the search. Several branching rules have been proposed, but none is the best in all cases. This work investigates the use of automated methods for choosing the most appropriate branching rule at each node in the search tree. We consider a reinforcement-learning approach where a value function, which predicts the performance of each branching rule in each case, is learned through trial runs on a typical problem set of the target class of SAT problems. Our results indicate that, provided sufficient training on a given class, the resulting strategy performs as well as (and, in some cases, better than) the best branching rule for that class.

关键词： Algorithm Selection Branching Rules Reinforcement Learning Satisfiability value function approximation

来源：评论

学校读者我要写书评

暂无评论

EXPERT-BASED REWARD SHAPING AND EXPLORATION SCHEME FOR BOOSTING POLICY LEARNING OF DIALOGUE MANAGEMENT

EXPERT-BASED REWARD SHAPING AND EXPLORATION SCHEME FOR BOOST...

引用

IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

作者： Ferreira, Emmanuel Lefevre, Fabrice Univ Avignon LIA F-84911 Avignon 9 France

ISBN: (纸本)9781479927562

This paper investigates the conditions under which expert knowledge can be used to accelerate the policy optimization of a learning agent. Recent works on reinforcement learning for dialogue management allowed to devise sophisticated methods for value estimation in order to deal all together with exploration/exploitation dilemma, sample-efficiency and non-stationary environments. In this paper, a reward shaping method and an exploration scheme, both based on some intuitive hand-coded expert advices, are combined with an efficient temporal difference-based learning procedure. The key objective is to boost the initial training stage, when the system is not sufficiently reliable to interact with real users (e. g. clients). Our claims are illustrated by experiments based on simulation and carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS).

关键词： dialogue management reinforcement learning reward shaping value function approximation

来源：评论

学校读者我要写书评

暂无评论

Temporal-Difference Learning An Online Support Vector Regression Approach 12

Temporal-Difference Learning <i>An Online Support Vector Reg...

引用

12th International Conference on Informatics in Control Automation and Robotics (ICINCO)

作者： Teixeira, Hugo Tanzarella Bottura, Celso Pascoli State Univ Campinas UNICAMP Sch Elect & Comp Engn FEEC DSIF LCSI Av Albert Einstein 400LE31 BR-13081970 Campinas SP Brazil

ISBN: (纸本)9789897581496

This paper proposes a new algorithm for Temporal-Difference (TD) learning using online support vector regression. It benefits from the good generalization properties support vector regression (SVR) has, and also can do incremental learning and automatically track variation of environment with time-varying characteristics. Using the online SVR we can obtain good estimation of value function in TD learning in linear and nonlinear prediction problems. Experimental results demonstrate the effectiveness of the proposed method by comparison with others methods.

关键词： Machine Learning Reinforcement Learning Temporal Difference Learning value function approximation Online Support Vector Machine

来源：评论

学校读者我要写书评

暂无评论

The Operation Optimization Model of Pumped-Hydro Power Storage Station Based on Approximate Dynamic Programming

The Operation Optimization Model of Pumped-Hydro Power Stora...

引用

International Conference on Power System Technology (PowerCon)

作者： Liang, Zhencheng Li, Yu Wei, Hua Guangxi Univ Sch Elect Engn Nanning 530004 Peoples R China Guangxi Key Lab Power Syst Optimizat & Energy Tec Nanning Peoples R China

ISBN: (纸本)9781479950324

Based on the hypothesis that pumped-hydro power storage (PHPS) station is available for multi-day optimization and adjustment, the paper has proposed a long-term operation optimization model of PHPS station based on approximate dynamic programming (ADP). In this multistage decision model, across the stages, value function approximation (VFA) of the reservoir energy storage was used to keep the overall optimization characteristics;During the stages, generated energy & generating periods, and electricity consumption for pumping & pumping periods are used as decision variables to conduct daily optimization operation. The paper got the approximate optimal solution through iteration solution decision variable and value function so as to avoid "curse of dimensionality" in conventional multistage decision model. According to the experiment, the ADP-based model can accurately describe the long-term operation modes of PHPS station, and its calculation methods are more appropriate for this kind of large-scale optimized decision problem than dynamic programming (DP) and conventional mathematic planning methods.

关键词： pumped-hydro power storage station long-term operation optimization value function approximation approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

R-learning and Gaussian Process Regression Algorithm for Cloud Job Access Control 3

R-learning and Gaussian Process Regression Algorithm for Clo...

引用

3rd IEEE International Conference on Cyber Security and Cloud Computing (IEEE CSCloud) / 2nd IEEE International Conference of Scalable and Smart Cloud (IEEE SSC)

作者： Peng, Zhiping Cui, Delong Ma, Yuanjia Xiong, Jianbin Xu, Bo Lin, Weiwei Guangdong Univ Petrochem Technol Coll Comp & Elect Informat Maoming Peoples R China South China Univ Technol Sch Comp Sci & Engn Guangzhou Guangdong Peoples R China

ISBN: (纸本)9781509009466

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Recently reinforcement learning has been given abroad attention, but when it is applied to solve problems with large-scale discrete or contiguous state space environments, the results are likely to be unsatisfactory and even fail to find optimal policies. In order to solve this problem, we establish a new generative model about the value function and use Gaussian Process Regression to approximate the state-action pairs which were never or seldom visited. We testify to the performance of the proposed algorithm by an access-control queuing job in a cloud computing environment. The computational results demonstrate the scheme can balance the exploration and exploitation in the learning process and accelerate the convergence to a certain extent.

关键词： reinforcement learning Gaussian process regression value function approximation cloud computing R-learning

来源：评论

学校读者我要写书评

暂无评论

Energy management of PV-storage systems: ADP approach with temporal difference learning 19

Energy management of PV-storage systems: ADP approach with t...

引用

19th Power Systems Computation Conference (PSCC)

作者： Keerthisinghe, Chanaka Verbic, Gregor Chapman, Archie C. Univ Sydney Sch Elect & Informat Engn Sydney NSW Australia

ISBN: (纸本)9788894105124

In the future, residential energy users can seize the full potential of demand response schemes by using an automated home energy management system (HEMS) to schedule their distributed energy resources. In order to generate high quality schedules, a HEMS needs to consider the stochastic nature of the PV generation and energy consumption as well as its inter-daily variations over several days. However, extending the decision horizon of proposed optimisation techniques is computationally difficult and moreover, these approaches are only computationally feasible with a limited number of storage devices and a low-resolution decision horizon. Given these existing shortcomings, this paper presents an approximate dynamic programming (ADP) approach with temporal difference learning for implementing a computationally efficient HEMS. In ADP, we obtain policies from value function approximations by stepping forward in time, compared to the value functions obtained by backward induction in DP. We use empirical data collected during the Smart Grid Smart City project in NSW, Australia, to estimate the parameters of a Markov chain model of PV output and electrical demand, which are then used in all simulations. To evaluate the quality of the solutions generated by ADP, we compare the ADP method to stochastic mixed-integer linear programming (MILP) and dynamic programming (DP). Our results show that ADP computes a solution much quicker than both DP and stochastic MILP, while providing better quality solutions than stochastic MILP and only a slight reduction in quality compared to the DP solution. Moreover, unlike the computationally-intensive DP, the ADP approach is able to consider a decision horizon beyond one day while also considering multiple storage devices, which results in a HEMS that can capture additional financial benefits.

关键词： demand response home energy management distributed energy resources approximate dynamic programming dynamic programming stochastic mixed-integer linear programming value function approximation temporal difference learning

来源：评论

学校读者我要写书评

暂无评论

Tracking in Reinforcement Learning

引用

16th International Conference on Neural Information Processing (ICONIP 2009)

作者： Geist, Matthieu Pietquin, Olivier Fricout, Gabriel IMS Res Grp Metz France Arcelor Mittal Res MC Cluster Metz France INRIA Nancy Grand Est CORIDA project team Nancy France

ISBN: (纸本)9783642106767

Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the environment of the learning agent can be considered as stationary, generalized policy iteration frameworks, because of the interleaving of learning and control, will produce non-stationarity of the evaluated policy and so of its value function. Tracking the optimal solution instead of trying to converge to it is therefore preferable. In this paper, we propose to handle this tracking issue with a Kalman-based temporal difference framework. Complexity and convergence analysis are studied. Empirical investigations of its ability to handle non-stationarity is finally provided.

关键词： Reinforcement learning value function approximation tracking Kalman filtering

来源：评论

学校读者我要写书评

暂无评论

Advances in Tactical & Operational Planning for Less-than-Truckload Carriers

Advances in Tactical & Operational Planning for Less-than-Tr...

引用

作者： Baubaid, Ahmad Ali Georgia Institute of Technology

学位级别：博士

This thesis explores tactical and operational planning problems in the context of the Less-than-Truckload (LTL) industry. LTL carriers transport shipments that occupy a small fraction of trailer capacity, and, thus, rely on the consolidation of freight from multiple shippers to achieve economies of scale. The first part of this thesis focuses on tactical planning operations of LTL carriers. In particular, in Chapter 2, we study the service network design problem confronted by LTL carriers ahead of an operating season. This problem includes determining: (1) the number of services (trailers) to operate between each pair of terminals, and (2) a load plan which specifies the sequence of transfer terminals that freight with a given origin and destination will visit. Traditionally, for every terminal and every ultimate destination, a load plan specifies a unique next terminal. We introduce the p-alt model, which generalizes traditional load plans by allowing decision-makers to specify a desired number of next terminal options for terminal-destination pairs using a vector p. We compare a number of exact and heuristic approaches for solving a two-stage stochastic variant of the p-alt model. Using this model, we show that by explicitly considering demand uncertainty and by merely allowing up to two next terminal options for terminal-destination pairs in the load plans, carriers can generate substantial cost savings; cost savings that are comparable to those yielded by adopting load plans that allow for any next terminal to be a routing option for terminal-destination pairs. Moreover, by using these more flexible load plans, carriers can generate cost savings in the order of 10% over traditional load plan designs obtained by deterministic models. The second part of the thesis shifts to an operational setting relating to how freight is routed through the carrier's service network. As the daily freight quantities handled by a carrier are uncertain, freight routes are dynamica

关键词： Freight transportation Less-than-truckload Load planning Approximate Dynamic Programming value function approximation

来源：评论

学校读者我要写书评

暂无评论

Compensation guarantees in crowdsourced delivery: Impact on platform and driver welfare

引用

OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE 2024年 122卷

作者： Alnaggar, Aliaa Gzara, Fatma Bookbinder, James H. Toronto Metropolitan Univ Dept Mech & Ind Engn Toronto ON M5B 2K3 Canada Univ Waterloo Dept Management Sci Waterloo ON N2L 3G1 Canada

Crowdsourced delivery and other sharing economy platforms attract freelance workers by offering them flexibility in scheduling their own work hours. Those platforms, however, have been criticized for the lack of protection they offer workers. Since workers are treated as independent contractors, they do not receive minimum wage and other protection measures under labor law. In this paper, we examine the integration of driver compensation guarantees in a platform's dynamic matching decisions. We study the problem of designing dynamic matching policies that guarantee a particular level of compensation for active workers over a time period, while maintaining work hour flexibility. We model three types of policies, that are either wage-based or utilization-based. We propose an MDP model to capture the dynamic and stochastic nature of the problem, then design a value function approximation algorithm to efficiently solve the large-scale MDP model. Extensive computational testing is conducted to assess the performance of the proposed solution methodology and the compensation guarantees, using synthetic and real-world datasets. Our findings suggest that the utilization policy results in the highest earning for drivers, though at the expense of longer empty miles from drivers' origins to the pickup locations of matched orders. On the other hand, the effective wage policy leads to shorter average distance to pickup, but slightly lower earning to drivers. Both policies result in only a slight decrease in platform profit as compared to the base case, and exhibit lower dispersion in the distribution of driver earning while active. In contrast, the nominal wage policy shows a comparable trend to the base-case policy in terms of average driver earnings, suggesting minimal benefits for drivers.

关键词： Crowdsourced delivery Compensation Sharing economy Markov decision process value function approximation

来源：评论

学校读者我要写书评

暂无评论

Optimal intra-day operations of behind-the-meter battery storage for primary frequency regulation provision: A hybrid lookahead method

引用

ENERGY 2022年第0期247卷 123482-123482页

作者： Wen, Kerui Li, Weidong Yu, Samson Shenglong Li, Ping Shi, Peng Dalian Univ Technol Fac Elect Informat & Elect Engn Dalian 116024 Peoples R China Deakin Univ Sch Engn 75 Pigdgon Rd Waurn Ponds Vic 3216 Australia State Grid Liaoning Elect Power Supply Co Ltd Elect Power Res Inst Shenyang 110006 Peoples R China Univ Adelaide Sch Elect & Elect Engn North Terrace Adelaide SA 5000 Australia

Battery energy storage systems (BESSs) are being widely installed behind-the-meter to reduce electricity bill. By providing grid ancillary services, behind-the-meter BESSs can increase potential revenue streams. This study targets the simultaneous electricity bill reduction and primary frequency regulation (PFR) provision. With the expansion of the application spectrum, the intra-day operations become more and more complicated. In this paper, a hybrid lookahead method with value function approximation strategy is proposed for intra-day operations, wherein the concept of "offline calculationdonline application" is devised and implemented. The approximate value function is trained offline to represent the expected long-term benefit. A two-stage robust approximate dynamic programming (ADP) model is formulated for one day operation which is optimized to adjust the power baseline with a forward rolling horizon. Furthermore, multi-dimensional indicators are introduced to evaluate the proposed strategy. Simulations and benchmarking comparisons are performed for a 0.5 MW/1.0 MWh BESS to verify the superior performance of the proposed strategy. The results show that the approximate value function can be obtained offline with 99.07% convergence precision. Moreover, the proposed strategy can ensure the economic benefit and PFR provision within a short online computing time. The resulting intra-day economic benefit can reach 95.55% of the theoretical optimum, and the online optimization consumes only 4.65s for a prediction horizon of 5 min, which ensures the feasibility of real-time predictive optimization. (C) 2022 Elsevier Ltd. All rights reserved.

关键词： Optimal intra-day operations Behind-the-meter battery storage Primary frequency regulation Electricity bill reduction value function approximation Offline calculation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：