咨询与建议

限定检索结果

文献类型

  • 4 篇 会议
  • 2 篇 期刊文献

馆藏范围

  • 6 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 6 篇 工学
    • 6 篇 计算机科学与技术...
    • 2 篇 信息与通信工程
    • 1 篇 电气工程
  • 2 篇 教育学
    • 2 篇 教育学

主题

  • 6 篇 llm-as-a-judge
  • 4 篇 large language m...
  • 2 篇 open source
  • 2 篇 automatic feedba...
  • 2 篇 programming feed...
  • 2 篇 generative ai
  • 2 篇 automatic evalua...
  • 1 篇 gpt-4
  • 1 篇 automated softwa...
  • 1 篇 large language m...
  • 1 篇 human-centered c...
  • 1 篇 biases
  • 1 篇 llms
  • 1 篇 reinforcement le...
  • 1 篇 educational chat...
  • 1 篇 prompt injection...
  • 1 篇 zephyr
  • 1 篇 direct preferenc...
  • 1 篇 higher-ed
  • 1 篇 code llama

机构

  • 2 篇 univ auckland au...
  • 2 篇 univ jyvaskyla j...
  • 2 篇 aalto univ espoo
  • 1 篇 duke univ durham...
  • 1 篇 singapore manage...
  • 1 篇 arizona state un...
  • 1 篇 arizona state un...
  • 1 篇 univ notre dame ...
  • 1 篇 cambridge ma uni...
  • 1 篇 lehigh univ beth...
  • 1 篇 arizona state un...
  • 1 篇 diro université ...
  • 1 篇 university of no...
  • 1 篇 ibm research yor...
  • 1 篇 carnegie mellon ...
  • 1 篇 huazhong univ sc...

作者

  • 2 篇 dainese nicola
  • 2 篇 denny paul
  • 2 篇 koutcheme charle...
  • 2 篇 leinonen juho
  • 2 篇 hellas arto
  • 2 篇 sarsa sami
  • 1 篇 xin zhou
  • 1 篇 sun lichao
  • 1 篇 brachman michell...
  • 1 篇 huang yue
  • 1 篇 liu wenxing
  • 1 篇 zhou pan
  • 1 篇 ashraf syed
  • 1 篇 liu yinuo
  • 1 篇 aton kamanda
  • 1 篇 houari sahraoui
  • 1 篇 wang zichu
  • 1 篇 ahmed ishrat
  • 1 篇 li toby jia-jun
  • 1 篇 ashktorab zahra

语言

  • 5 篇 英文
  • 1 篇 其他
检索条件"主题词=LLM-as-a-judge"
6 条 记 录,以下是1-10 订阅
排序:
MetricMate: An Interactive Tool for Generating Evaluation Criteria for llm-as-a-judge Workflow
MetricMate: An Interactive Tool for Generating Evaluation Cr...
收藏 引用
Joint of the ACM Workshops at the International Conference on Intelligent User Interfaces 2025, IUI-WS 2025
作者: Gebreegziabher, Simret Araya Chiang, Charles Wang, Zichu Ashktorab, Zahra Brachman, Michelle Geyer, Werner Li, Toby Jia-Jun Gómez-Zará, Diego University of Notre Dame Notre Dame IN United States Carnegie Mellon University Pittsburgh PA United States IBM Research Yorktown Heights NY United States Cambridge MA United States
Large Language Models (llms) are increasingly employed to evaluate complex, large datasets in automated ways. By combining llms' rationale capabilities with user-defined criteria, llm-as-a-judge systems can automa... 详细信息
来源: 评论
Optimization-based Prompt Injection Attack to llm-as-a-judge  24
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
收藏 引用
31st Conference on Computer and Communications Security
作者: Shi, Jiawen Yuan, Zenghui Liu, Yinuo Huang, Yue Zhou, Pan Sun, Lichao Gong, Neil Zhenqiang Huazhong Univ Sci & Technol Wuhan Peoples R China Univ Notre Dame South Bend IN USA Lehigh Univ Bethlehem PA 18015 USA Duke Univ Durham NC 27708 USA
llm-as-a-judge uses a large language model (llm) to select the best response from a set of candidates for a given question. llm-as-a-judge has many applications such as llm-powered search, reinforcement learning with ... 详细信息
来源: 评论
CodeUltraFeedback: An llm-as-a-judge Dataset for Aligning Large Language Models to Coding Preferences
收藏 引用
ACM Transactions on Software Engineering and Methodology 1000年
作者: Martin Weyssow Aton Kamanda Xin Zhou Houari Sahraoui DIRO Université de Montréal Canada Singapore Management University Singapore
Evaluating the alignment of large language models (llms) with user-defined coding preferences is a challenging endeavor that requires a deep assessment of llms’ outputs. Existing methods and benchmarks rely primarily... 详细信息
来源: 评论
Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
收藏 引用
COMPUTERS 2025年 第3期14卷 100-100页
作者: Ahmed, Ishrat Liu, Wenxing Roscoe, Rod D. Reilley, Elizabeth Mcnamara, Danielle S. Arizona State Univ Learning Engn Inst Tempe AZ 85281 USA Arizona State Univ Enterprise Technol AI Accelerat Tempe AZ 85281 USA Arizona State Univ Human Syst Engn Tempe AZ 85281 USA
Large language models (llms) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inhe... 详细信息
来源: 评论
Evaluating Language Models for Generating and Judging Programming Feedback  2025
Evaluating Language Models for Generating and Judging Progra...
收藏 引用
56th Technical Symposium on Computer Science Education
作者: Koutcheme, Charles Dainese, Nicola Sarsa, Sami Hellas, Arto Leinonen, Juho Ashraf, Syed Denny, Paul Aalto Univ Espoo Finland Univ Jyvaskyla Jyvaskyla Finland Univ Auckland Auckland New Zealand
The emergence of large language models (llms) has transformed research and practice across a wide range of domains. Within the computing education research (CER) domain, llms have garnered significant attention, parti... 详细信息
来源: 评论
Open Source Language Models Can Provide Feedback: Evaluating llms' Ability to Help Students Using GPT-4-As-A-judge  29
Open Source Language Models Can Provide Feedback: Evaluating...
收藏 引用
29th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE)
作者: Koutcheme, Charles Dainese, Nicola Sarsa, Sami Hellas, Arto Leinonen, Juho Denny, Paul Aalto Univ Espoo Finland Univ Jyvaskyla Jyvaskyla Finland Univ Auckland Auckland New Zealand
Large language models (llms) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of... 详细信息
来源: 评论