咨询与建议

限定检索结果

文献类型

  • 6 篇 期刊文献
  • 5 篇 会议

馆藏范围

  • 11 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 11 篇 工学
    • 8 篇 控制科学与工程
    • 6 篇 计算机科学与技术...
    • 2 篇 电气工程
  • 2 篇 理学
    • 2 篇 数学
    • 1 篇 系统科学
  • 2 篇 管理学
    • 2 篇 管理科学与工程(可...
  • 1 篇 法学
    • 1 篇 社会学

主题

  • 11 篇 preference-based...
  • 4 篇 reward learning
  • 4 篇 active learning
  • 3 篇 human-robot inte...
  • 3 篇 inverse reinforc...
  • 1 篇 multi-objective ...
  • 1 篇 pomdp
  • 1 篇 software library
  • 1 篇 markov games
  • 1 篇 function approxi...
  • 1 篇 agnostic
  • 1 篇 optimal algorith...
  • 1 篇 linear realizabi...
  • 1 篇 learning from de...
  • 1 篇 hierarchical rei...
  • 1 篇 human-in-the-loo...
  • 1 篇 control barrier ...
  • 1 篇 policy regret
  • 1 篇 regret analysis
  • 1 篇 safety-critical ...

机构

  • 2 篇 stanford univ de...
  • 1 篇 imt sch adv stud...
  • 1 篇 stanford univ de...
  • 1 篇 univ michigan an...
  • 1 篇 stanford univ de...
  • 1 篇 politecn milan d...
  • 1 篇 stanford univ de...
  • 1 篇 univ cambridge d...
  • 1 篇 guglielmo da sal...
  • 1 篇 univ southern ca...
  • 1 篇 stanford univ de...
  • 1 篇 univ southern ca...
  • 1 篇 stanford univ de...
  • 1 篇 microsoft res ny...
  • 1 篇 stanford univ co...
  • 1 篇 uc santa barbara...
  • 1 篇 natl inst inform...
  • 1 篇 univ calif berke...
  • 1 篇 guglielmo da sal...
  • 1 篇 stanford univ el...

作者

  • 5 篇 biyik erdem
  • 5 篇 sadigh dorsa
  • 1 篇 anari nima
  • 1 篇 molnar tamas g.
  • 1 篇 losey dylan p.
  • 1 篇 bemporad alberto
  • 1 篇 tucker maegan
  • 1 篇 ichise ryutaro
  • 1 篇 kramer oliver
  • 1 篇 lazar daniel a.
  • 1 篇 landolfi nichola...
  • 1 篇 aggarwal manish
  • 1 篇 piroddi luigi
  • 1 篇 halasz geza
  • 1 篇 kim beomjoon
  • 1 篇 lee dongryung
  • 1 篇 yue yisong
  • 1 篇 kochenderfer myk...
  • 1 篇 krishnamurthy ak...
  • 1 篇 ahn jiyong

语言

  • 11 篇 英文
检索条件"主题词=Preference-Based Learning"
11 条 记 录,以下是1-10 订阅
排序:
Safety-Aware preference-based learning for Safety-Critical Control  4
Safety-Aware Preference-Based Learning for Safety-Critical C...
收藏 引用
4th Annual Conference on learning for Dynamics and Control (L4DC)
作者: Cosner, Ryan K. Tucker, Maegan Taylor, Andrew J. Li, Kejun Molnar, Tamas G. Ubellacker, Wyatt Alan, Anil Orosz, Gabor Yue, Yisong Ames, Aaron D. CALTECH Pasadena CA 91125 USA Univ Michigan Ann Arbor MI 48109 USA Argo AI Pittsburgh PA USA
Bringing dynamic robots into the wild requires a tenuous balance between performance and safety. Yet controllers designed to provide robust safety guarantees often result in conservative behavior, and tuning these con... 详细信息
来源: 评论
APReL: A Library for Active preference-based Reward learning Algorithms  22
APReL: A Library for Active Preference-based Reward Learning...
收藏 引用
17th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI)
作者: Biyik, Erdem Talati, Aditi Sadigh, Dorsa Stanford Univ Elect Engn Stanford CA 94305 USA Stanford Univ Comp Sci Stanford CA USA Stanford Univ Comp Sci & Elect Engn Stanford CA USA
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants. Many preference-based learning algorithms and active querying techniques h... 详细信息
来源: 评论
Active preference-based Gaussian process regression for reward learning and optimization
收藏 引用
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH 2024年 第5期43卷 665-684页
作者: Biyik, Erdem Huynh, Nicolas Kochenderfer, Mykel J. Sadigh, Dorsa Stanford Univ Dept Elect Engn Stanford CA USA Univ Calif Berkeley Ctr Human Compatible Artificial Intelligence Berkeley CA USA Univ Southern Calif Thomas Lord Dept Comp Sci Los Angeles CA USA Ecole Polytech Dept Appl Math Palaiseau France Univ Cambridge Dept Comp Sci & Technol Cambridge England Stanford Univ Dept Aeronaut & Astronaut Stanford CA USA Stanford Univ Dept Comp Sci Stanford CA USA Univ Southern Calif 3737 Watt WayPowell Hall PHERoom 214 Los Angeles CA 90089 USA
Designing reward functions is a difficult task in AI and robotics. The complex task of directly specifying all the desirable behaviors a robot needs to optimize often proves challenging for humans. A popular solution ... 详细信息
来源: 评论
preference learning for guiding the tree search in continuous POMDPs  7
Preference learning for guiding the tree search in continuou...
收藏 引用
7th Conference on Robot learning (CoRL)
作者: Ahn, Jiyong Son, Sanghyeon Lee, Dongryung Han, Jisu Son, Dongwon Kim, Beomjoon Korea Adv Inst Sci & Technol Grad Sch AI Daejeon South Korea
A robot operating in a partially observable environment must perform sensing actions to achieve a goal, such as clearing the objects in front of a shelf to better localize a target object at the back, and estimate its... 详细信息
来源: 评论
Incentivizing Efficient Equilibria in Traffic Networks With Mixed Autonomy
收藏 引用
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS 2021年 第4期8卷 1717-1729页
作者: Biyik, Erdem Lazar, Daniel A. Pedarsani, Ramtin Sadigh, Dorsa Stanford Univ Dept Elect Engn Stanford CA 94305 USA UC Santa Barbara Dept Elect & Comp Engn Santa Barbara CA 93106 USA Stanford Univ Dept Comp Sci Stanford CA 94305 USA
Traffic congestion has large economic and social costs. The introduction of autonomous vehicles can potentially reduce this congestion by increasing road capacity via vehicle platooning and by creating an avenue for i... 详细信息
来源: 评论
learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
收藏 引用
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH 2022年 第1期41卷 45-67页
作者: Biyik, Erdem Losey, Dylan P. Palan, Malayandi Landolfi, Nicholas C. Shevchuk, Gleb Sadigh, Dorsa Stanford Univ Dept Elect Engn 353 Jane Stanford Way Stanford CA 94305 USA Stanford Univ Dept Comp Sci Stanford CA 94305 USA
Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. I... 详细信息
来源: 评论
Hierarchical learning from human preferences and curiosity
收藏 引用
APPLIED INTELLIGENCE 2022年 第7期52卷 7459-7479页
作者: Bougie, Nicolas Ichise, Ryutaro Grad Univ Adv Studies Sokendai Tokyo Japan Natl Inst Informat Tokyo Japan
Recent success in scaling deep reinforcement algorithms (DRL) to complex problems has been driven by well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are natural... 详细信息
来源: 评论
Batch Active learning of Reward Functions from Human preferences
收藏 引用
ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION 2024年 第2期13卷 1-27页
作者: Biyik, Erdem Anari, Nima Sadigh, Dorsa Univ Southern Calif Thomas Lord Dept Comp Sci 3737 Watt WayPowell Hall PHE Los Angeles CA 90089 USA Stanford Univ Dept Comp Sci Gates Comp Sci 168A353 Jane Stanford Way Stanford CA 94305 USA
Data generation and labeling are often expensive in robot learning. preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are common... 详细信息
来源: 评论
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability  33
Efficient and Optimal Algorithms for Contextual Dueling Band...
收藏 引用
33rd International Conference on Algorithmic learning Theory (ALT)
作者: Saha, Aadirupa Krishnamurthy, Akshay Microsoft Res New York NY 10012 USA
We study the K-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes preference-based feedback suggesti... 详细信息
来源: 评论
preferences-based Choice Prediction in Evolutionary Multi-objective Optimization  20th
Preferences-Based Choice Prediction in Evolutionary Multi-ob...
收藏 引用
20th European Conference on the Applications of Evolutionary Computation (EvoApplications)
作者: Aggarwal, Manish Heinermann, Justin Oehmcke, Stefan Kramer, Oliver Indian Inst Management Ahmedabad Dept Informat Syst Ahmadabad Gujarat India Carl von Ossietzky Univ Oldenburg Dept Comp Sci Computat Intelligence Grp Oldenburg Germany
Evolutionary multi-objective algorithms (EMOAs) of the type of NSGA-2 approximate the Pareto-front, after which a decision-maker (DM) is confounded with the primary task of selecting the best solution amongst all the ... 详细信息
来源: 评论