The sudden emergence of large language models (LLMs) such as ChatGPT has had a disruptive impact throughout the computing education community. LLMs have been shown to excel at producing correct code to CS1 and CS2 pro...
详细信息
ISBN:
(纸本)9798400711770
The sudden emergence of large language models (LLMs) such as ChatGPT has had a disruptive impact throughout the computing education community. LLMs have been shown to excel at producing correct code to CS1 and CS2 problems, and can even act as friendly assistants to students learning how to code. Recent work shows that LLMs demonstrate unequivocally superior results in being able to explain and resolve compiler errormessages-for decades, one of the most frustrating parts of learning how to code. However, LLM-generated error message explanations have only been assessed by expert programmers in artificial conditions. This work sought to understand how novice programmers resolve programming error messages (PEMs) in a more realistic scenario. We ran a within-subjects study with n = 106 participants in which students were tasked to fix six buggy C programs. For each program, participants were randomly assigned to fix the problem using either a stock compiler error message, an expert-handwritten error message, or an error message explanation generated by GPT-4. Despite promising evidence on synthetic benchmarks, we found that GPT-4 generated errormessages outperformed conventional compiler errormessages in only 1 of the 6 tasks, measured by students' time-to-fix each problem. Handwritten explanations still outperform LLM and conventional errormessages, both on objective and subjective measures.
programming error messages (PEMs) have long been a hindrance to novice programmers. This work aims to establish a catalog of PEM anti-patterns-common, reoccurring features of PEMs that make them unhelpful or actively ...
详细信息
ISBN:
(纸本)9781450394338
programming error messages (PEMs) have long been a hindrance to novice programmers. This work aims to establish a catalog of PEM anti-patterns-common, reoccurring features of PEMs that make them unhelpful or actively harmful to programmers. The goal is for educators to be aware of, and actively teach concrete ways that PEMs can be misleading to students;to encourage language implementers to be cognizant of these;and avoid them when designing error feedback. A pilot study is being conducted to validate the presence of anti-patterns in errormessages.
A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the...
详细信息
ISBN:
(纸本)9781450394314
A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix them. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
Reading a programmingerror message is the first step in understanding what it is trying to tell the programmer about how to fix an error in their code. However, these are often difficult to read, especially for novic...
详细信息
ISBN:
(纸本)9781450394314
Reading a programmingerror message is the first step in understanding what it is trying to tell the programmer about how to fix an error in their code. However, these are often difficult to read, especially for novices which is not surprising given that errormessages in many of the most popular languages in which novices learn to code were not written with readability in mind. As a result, novices frequently struggle to understand them. This is a long-standing problem, with researchers highlighting concerns about programmingerror message readability over the last six decades. Very recent work has put forward evidence of the need for measuring readability in errormessages and a framework for doing so. This framework consists of four factors of readability for programming error messages: message length, vocabulary, jargon, and sentence construction. We use this framework to implement an approach to automatically assess the readability of programming error messages. Using established readability factors as predictors in a machine learning model, we train several models using a dataset of C and Java errormessages. We examine the performance of these models, and apply the best performing model to a previously published set of messages evaluated for readability by experts, non-experts and students. Our results validate the previously proposed readability factors, and our model classifies messages similarly to human raters. Finally, we discuss future work needed to improve the accuracy of the model.
programming error messages have proven to be notoriously problematic for novices who are learning to program. Although recent efforts have focused on improving message wording, these have been criticized for attemptin...
详细信息
ISBN:
(纸本)9781450389761
programming error messages have proven to be notoriously problematic for novices who are learning to program. Although recent efforts have focused on improving message wording, these have been criticized for attempting to improve usability without first understanding and addressing readability. To date, there has been no research dedicated to the readability of programming error messages and how this could be assessed. In this paper we examine human-based assessments of programmingerror message readability and make two important contributions. First, we conduct an experiment using the top twenty most-frequent errormessages in three popular programming languages (Python, Java, and C), revealing that human notions of readability are highly subjective and dependent on both programming experience and language familiarity. Both novices and experts agreed more about which messages are more readable, but disagreed more about which messages are not readable. Second, we leverage the data from this experiment to uncover several key factors that seem to affect message readability: message length, message tone, and use of jargon. We discuss how these factors can help guide future efforts to design a readability metric for programming error messages.
This paper investigates supervised fine-tuning of large language models (LLMs) to improve their pedagogical alignment in computing education, addressing concerns that LLMs may hinder learning outcomes. The project uti...
详细信息
ISBN:
(纸本)9798400710384
This paper investigates supervised fine-tuning of large language models (LLMs) to improve their pedagogical alignment in computing education, addressing concerns that LLMs may hinder learning outcomes. The project utilised a proprietary dataset of 2,500 high quality question/answer pairs from programming course forums, and explores two research questions: the suitability of university course forums in contributing to fine-tuning datasets, and how supervised fine-tuning can improve LLMs' alignment with educational principles such as constructivism. Initial findings suggest benefits in pedagogical alignment of LLMs, with deeper evaluations required.
programming error messages (PEMs) often prove to be troublesome for novice programmers. Guidelines to improve PEMs often lack theoretical or empirical justification. This research will establish a theoretical foundati...
详细信息
ISBN:
(纸本)9781450397421
programming error messages (PEMs) often prove to be troublesome for novice programmers. Guidelines to improve PEMs often lack theoretical or empirical justification. This research will establish a theoretical foundation for what makes a "good" PEM, based on existing theories that have not yet been applied to PEMs. These findings will be applied to an existing programming environment and result in empirically-validated advice for language implementers.
In the challenging field of introductory programming, high enrolments and failure rates drive us to explore tools and systems to enhance student outcomes, especially automated tools that scale to large cohorts. This p...
详细信息
ISBN:
(纸本)9798400704239
In the challenging field of introductory programming, high enrolments and failure rates drive us to explore tools and systems to enhance student outcomes, especially automated tools that scale to large cohorts. This paper presents and evaluates the dcc --help tool, an integration of a Large Language Model (LLM) into the Debugging C Compiler (DCC) to generate unique, novice-focused explanations tailored to each error. dcc --help prompts an LLM with contextual information of compile- and run-time error occurrences, including the source code, error location and standard compiler error message. The LLM is instructed to generate novice-focused, actionable error explanations and guidance, designed to help students understand and resolve problems without providing solutions. dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We analysed a subset of these error/explanation pairs to evaluate their properties, including conceptual correctness, relevancy, and overall quality. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code. Our findings, observations and reflections following deployment indicate that dcc --help provides novel opportunities for scaffolding students' introduction to programming.
Large Language Models (LLMs) present a transformative opportunity to address longstanding challenges in computing education. This paper presents a conversational AI extension to an LLM-enhanced C/C++ compiler which ge...
详细信息
ISBN:
(纸本)9798400705311
Large Language Models (LLMs) present a transformative opportunity to address longstanding challenges in computing education. This paper presents a conversational AI extension to an LLM-enhanced C/C++ compiler which generates pedagogically sound programmingerror explanations. Our new tool, DCC Sidekick, retains compiler integration, allowing students to see their code, errormessages, and stack frames alongside a conversational AI interface. Compiler context improves error explanations, and provides a seamless development experience. We present quantitative analyses of Sidekick's usage and engagement patterns in a large CS1 course. In the first seven weeks of use, 959 students initiated 11,222 DCC Sidekick sessions, generating 17,982 error explanations. Over half of all conversations occur outside of business hours, highlighting the value of these always-available tools. Early results indicate strong adoption of conversational AI debugging tools, demonstrating scalability in supporting large CS1 courses. We share implementation details and lessons learned, offering guidance to educators considering integrating AI tools with pedagogical guardrails.
Diagnostic messages generated by compilers and interpreters such as syntax errormessages have been researched for over half of a century. Unfortunately, these messages which include error, warning, and run-time messa...
详细信息
ISBN:
(纸本)9781450375672
Diagnostic messages generated by compilers and interpreters such as syntax errormessages have been researched for over half of a century. Unfortunately, these messages which include error, warning, and run-time messages, present substantial difficulty and could be more effective, particularly for novices. Recent years have seen an increased number of papers in the area including studies on the effectiveness of these messages, improving or enhancing them, and their usefulness as a part of programming process data that can be used to predict student performance, track student progress, and tailor learning plans. Despite this increased interest, the long history of literature is quite scattered and has not been brought together in any digestible form. In order to help the computing education community (and related communities) to further advance work on programming error messages, we present a comprehensive, historical and state-of-the-art report on research in the area. In addition, we synthesise and present the existing evidence for these messages including the difficulties they present and their effectiveness. We finally present a set of guidelines, curated from the literature, classified on the type of evidence supporting each one (historical, anecdotal, and empirical). This work can serve as a starting point for those who wish to conduct research on compiler errormessages, runtime errors, and warnings. We also make the bibtex file of our 300+ reference corpus publicly available. Collectively this report and the bibliography will be useful to those who wish to design better messages or those that aim to measure their effectiveness, more effectively.
暂无评论