检索结果-内蒙古大学图书馆

The Journal of Machine learning Research 2017年第1期18卷

作者： Christian Wirth Riad Akrour Gerhard Neumann Johannes Fürnkranz Knowledge Engineering Group Technische Universität Darmstadt Darmstadt Germany Computational Learning for Autonomous Systems Technische Universität Darmstadt Darmstadt Germany Computational Learning School of Computer Science University of Lincoln Lincoln Great Britain

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function often requires a lot of task-specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert's preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. Furthermore, we point out shortcomings of current algorithms, propose open research questions and briefly survey practical tasks that have been solved using PbRL.

关键词： Markov decision process policy search preference learning preference-based reinforcement learning qualitative feedback reinforcement learning temporal difference learning

来源：评论

学校读者我要写书评

暂无评论

Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation

Contextual Relative Entropy Policy Search with Covariance Ma...

引用

IEEE International Conference on autonomous Robot systems and Competitions (ICARSC)

作者： Abbas Abdolmaleki David Simões Nuno Lau Luís Paulo Reis Gerhard Neumann DETIIUA - Electronics University of Aveiro Aveiro Portugal LIACC - Artificial Intelligence and Computer Science Laboratory University of Porto Porto Portugal IEETA - Institute of Electronics and Informatics Engineering of Aveiro University of Aveiro Aveiro Portugal DSI - Department of Information Systems University of Minho Braga Portugal CLAS - Computational Learning for Autonomous Systems Group Technische Universität Darmstadt Darmstadt Germany

Stochastic search algorithms are black-box optimizers of an objective function. They have recently gained a lot of attention in operations research, machine learning and policy search of robot motor skills due to their ease of use and their generality. However, with slightly different tasks or objective functions, many stochastic search algorithms require complete re-learning in order to adapt the solution to the new objective function or the new context. As such, we consider the contextual stochastic search paradigm. Here, we want to find good parameter vectors for multiple related tasks, where each task is described by a continuous context vector. Hence, the objective function might change slightly for each parameter vector evaluation. Contextual algorithms have been investigated in the field of policy search. However, contextual policy search algorithms typically suffer from premature convergence and perform unfavourably in comparison with state of the art stochastic search methods. In this paper, we investigate a contextual stochastic search algorithm known as Contextual Relative Entropy Policy Search (CREPS), an informationtheoretic algorithm that can learn for multiple tasks simultaneously. We extend that algorithm with a covariance matrix adaptation technique that alleviates the premature convergence problem. We call the new algorithm Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We will show that CREPS-CMA outperforms the original CREPS by orders of magnitude. We illustrate the performance of CREPS-CMA on several contextual tasks, including a complex simulated robot kick task.

关键词： Context Covariance matrices Linear programming Convergence Entropy Robots Search methods

来源：评论

学校读者我要写书评

暂无评论

Mapping geographical inequalities in childhood diarrhoeal morbidity and mortality in low-income and middle-income countries, 2000-17: analysis for the Global Burden of Disease Study 2017 (vol 395, pg 1779, 2020)

引用

LANCET 2020年第10239期395卷 1762-1762页

作者： Reiner, R. C., Jr. Hay, S., I Institute for Health Metrics and Evaluation University of Washington Seattle WA United States Department of Global Health School of Medicine University of Washington Seattle WA United States Department of Health Metrics Sciences School of Medicine University of Washington Seattle WA United States College of Health and Medical Sciences Haramaya University Harar Ethiopia Department of Epidemiology and Biostatistics Haramaya University Harar Ethiopia Department of Medical Laboratory Sciences Haramaya University Harar Ethiopia School of Nursing and Midwifery Haramaya University Harar Ethiopia School of Pharmacy Haramaya University Harar Ethiopia School of Public Health Haramaya University Harar Ethiopia Haramaya University Harar Ethiopia Advanced Diagnostic and Interventional Radiology Research Center Tehran University of Medical Sciences Tehran Iran Cancer Biology Research Center Tehran University of Medical Sciences Tehran Iran Cancer Research Institute Tehran University of Medical Sciences Tehran Iran Department of Economics and Management Sciences for Health Tehran University of Medical Sciences Tehran Iran Department of Environmental Health Engineering Tehran University of Medical Sciences Tehran Iran Department of Epidemiology and Biostatistics Tehran University of Medical Sciences Tehran Iran Department of Health Management and Economics Tehran University of Medical Sciences Tehran Iran Department of Microbiology Tehran University of Medical Sciences Tehran Iran Department of Pharmacology Tehran University of Medical Sciences Tehran Iran Digestive Diseases Research Institute Tehran University of Medical Sciences Tehran Iran Endocrinology and Metabolism Research Center Tehran University of Medical Sciences Tehran Iran Hematology-Oncology and Stem Cell Transplantation Research Center Tehran University of Medical Sciences Tehran Iran Iran National Institute of Health Research Tehran University of Medical Sciences Tehran Iran Metabolomics and

Summary Background Across low-income and middle-income countries (LMICs), one in ten deaths in children younger than 5 years is attributable to diarrhoea. The substantial between-country variation in both diarrhoea incidence and mortality is attributable to interventions that protect children, prevent infection, and treat disease. Identifying subnational regions with the highest burden and mapping associated risk factors can aid in reducing preventable childhood *** We used Bayesian model-based geostatistics and a geolocated dataset comprising 15 072 746 children younger than 5 years from 466 surveys in 94 LMICs, in combination with findings of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017, to estimate posterior distributions of diarrhoea prevalence, incidence, and mortality from 2000 to 2017. From these data, we estimated the burden of diarrhoea at varying subnational levels (termed units) by spatially aggregating draws, and we investigated the drivers of subnational patterns by creating aggregated risk factor *** The greatest declines in diarrhoeal mortality were seen in south and southeast Asia and South America, where 54·0% (95% uncertainty interval [UI] 38·1-65·8), 17·4% (7·7-28·4), and 59·5% (34·2-86·9) of units, respectively, recorded decreases in deaths from diarrhoea greater than 10%. Although children in much of Africa remain at high risk of death due to diarrhoea, regions with the most deaths were outside Africa, with the highest mortality units located in Pakistan. Indonesia showed the greatest within-country geographical inequality; some regions had mortality rates nearly four times the average country rate. Reductions in mortality were correlated to improvements in water, sanitation, and hygiene (WASH) or reductions in child growth failure (CGF). Similarly, most high-risk areas had poor WASH, high CGF, or low oral rehydration therapy *** By co-analysing geospatial trends in d

关键词：

来源：评论

学校读者我要写书评

暂无评论

Robust policy updates for stochastic optimal control

Robust policy updates for stochastic optimal control

引用

IEEE-RAS International Conference on Humanoid Robots

作者： Elmar Rueckert Max Mindt Jan Peters Gerhard Neumann Intelligent Autonomous Systems Lab Technische Universitat Darmstadt Darmstadt Germany Robot Learning Group Max-Planck Institute for Intelligent Systems Tuebingen Germany Computational Learning for Autonomous Systems Darmstadt Germany

For controlling high-dimensional robots, most stochastic optimal control algorithms use approximations of the system dynamics and of the cost function (e.g., using linearizations and Taylor expansions). These approximations are typically only locally correct, which might cause instabilities in the greedy policy updates, lead to oscillations or the algorithms diverge. To overcome these drawbacks, we add a regularization term to the cost function that punishes large policy update steps in the trajectory optimization procedure. We applied this concept to the Approximate Inference Control method (AICO), where the resulting algorithm guarantees convergence for uninformative initial solutions without complex hand-tuning of learning rates. We evaluated our new algorithm on two simulated robotic platforms. A robot arm with five joints was used for reaching multiple targets while keeping the roll angle constant. On the humanoid robot Nao, we show how complex skills like reaching and balancing can be inferred from desired center of gravity or end effector coordinates.

关键词： Approximation algorithms Robot kinematics Trajectory Planning Approximation methods Gravity

来源：评论

学校读者我要写书评

暂无评论

Compact models of motor primitive variations for predictable reaching and obstacle avoidance

Compact models of motor primitive variations for predictable...

引用

9th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS09

作者： Stulp, Freek Oztop, Erhan Pastor, Peter Beetz, Michael Schaaz, Stefan Computational Learning and Motor Control Lab. University of Southern California Los Angeles CA United States Kyoto Japan Computational Neuroscience Laboratories Advanced Telecommunications Research Institute International Kyoto Japan Intelligent Autonomous Systems Group Technische Universitdt Munchen Munich Germany

ISBN: (纸本)9781424445882

In most activities of daily living, related tasks are encountered over and over again. This regularity allows humans and robots to reuse existing solutions for known recurring tasks. We expect that reusing a set of standard solutions to solve similar tasks will facilitate the design and on-line adaptation of the control systems of robots operating in human environments. In this paper, we derive a set of standard solutions for reaching behavior from human motion data. We also derive stereotypical reaching trajectories for variations of the task, in which obstacles are present. These stereotypical trajectories are then compactly represented with Dynamic Movement Primitives. On the humanoid robot Sarcos CB, this approach leads to reproducible, predictable, and human-like reaching motions. ©2009 IEEE.

关键词： Anthropomorphic robots

来源：评论

学校读者我要写书评

暂无评论

Compact models of motor primitive variations for predictable reaching and obstacle avoidance

Compact models of motor primitive variations for predictable...

引用

IEEE-RAS International Conference on Humanoid Robots

作者： Freek Stulp Erhan Oztop Peter Pastor Michael Beetz Stefan Schaal Computational Learning and Motor Control Laboratory University of Southern California Los Angeles CA USA Intelligent Autonomous Systems Group Technische Universität München Munich Germany Dokuritsu Gyosei Hojin Joho Tsushin Kenkyu Kiko Koganei Tokyo JP University of Southern California Los Angeles CA US

ISBN: (纸本)9781424445875

关键词： Predictive models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：