检索结果-内蒙古大学图书馆

Hands-on analysis of using large language models for the auto evaluation of programming assignments

INFORMATION SYSTEMS 2025年 128卷

作者： Mohamed, Kareem Yousef, Mina Medhat, Walaa Mohamed, Ensaf Hussein Khoriba, Ghada Arafa, Tamer Nile Univ Ctr Informat Sci Sch Informat Technol & Comp Sci Giza Egypt Helwan Univ Fac Comp & Artificial Intelligence Cairo Egypt Benha Univ Fac Comp & Artificial Intelligence Banha Egypt

The increasing adoption of programming education necessitates efficient and accurate methods for evaluating students' coding assignments. Traditional manual grading is time-consuming, often inconsistent, and prone to subjective biases. This paper explores the application of large language models (LLMs) for the automated evaluation of programming assignments. LLMs can use advanced natural language processing capabilities to assess code quality, functionality, and adherence to best practices, providing detailed feedback and grades. We demonstrate the effectiveness of LLMs through experiments comparing their performance with human evaluators across various programming tasks. Our study evaluates the performance of several LLMs for automated grading. Gemini 1.5 Pro achieves an exact match accuracy of 86% and a +/- 1 accuracy of 98%. GPT4o also demonstrates strong performance, with exact match and +/- 1 accuracies of 69% and 97%, respectively. Both models correlate highly with human evaluations, indicating their potential for reliable automated grading. However, models such as Llama 3 70B and Mixtral 8 x 7B exhibit low accuracy and alignment with human grading, particularly in problem-solving tasks. These findings suggest that advanced LLMs are instrumental in scalable, automated educational assessment. Additionally, LLMs enhance the learning experience by offering personalized, instant feedback, fostering an iterative learning process. The findings suggest that LLMs could play a pivotal role in the future of programming education, ensuring scalability and consistency in evaluation.

关键词： Large language models Auto evaluation programming assignments Educational technology Comparative analysis

来源：评论

学校读者我要写书评

暂无评论

Design and Evaluation of an AI-Assisted Grading Tool for Introductory programming assignments: An Experience Report 2025

Design and Evaluation of an AI-Assisted Grading Tool for Int...

引用

56th Technical Symposium on Computer Science Education

作者： Nagakalyani, Goda Chaudhary, Saurav Apte, Varsha Ramakrishnan, Ganesh Tamilselvam, Srikanth Indian Inst Technol Mumbai Maharashtra India IBM India Res Labs Bengaluru Karnataka India

ISBN: (纸本)9798400705328

In a typical introductory programming course, grading student-submitted programs involves an autograder which compiles and runs the programs and tests their functionality with predefined test cases, with no attention to the source code. However, in an educational setting, grading based on inspection of the source code is required for two main reasons (1) awarding partial marks to 'partially correct' code that may be failing the testcase check (2) awarding marks (or penalties) based on source code quality or specific criteria that the instructor may have laid out in the problem statement (e.g. 'implement sorting using bubble-sort'). However, grading based on studying the source code can be highly time consuming when the course has a large enrollment. In this paper we present the design and evaluation of an AI Assistant for source code grading, which we have named TA Buddy. TA Buddy is powered by Code Llama, a large language model especially trained for code related tasks, which we fine-tuned using a graded programs dataset. Given a problem statement, student code submissions and a grading rubric, TA Buddy can be asked to generate suggested grades, i.e. ratings for the various rubric criteria, for each submission. The human teaching assistant (TA) can then accept or overrule these grades. We evaluated the TA Buddy-assisted manual grading against 'pure' manual grading and found that the time taken to grade reduced by 24% while maintaining grade agreement in the two cases at 90%.

关键词： Source Code Evaluation AI-Assisted Grading LLMs programming assignments CS Education Rubric Grading

来源：评论

学校读者我要写书评

暂无评论

Generating Concise Patches for Newly Released programming assignments

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2023年第1期49卷 450-467页

作者： Li, Leping Liu, Hui Li, Kejun Jiang, Yanjie Sun, Rui Beijing Inst Technol Sch Comp Sci & Technol Beijing 100081 Peoples R China Baidu Inc Beijing 100085 Peoples R China

In programming courses, providing students with concise and constructive feedback on faulty submissions (programs) is highly desirable. However, providing feedback manually is often time-consuming and tedious. To release tutors from the manual construction of concise feedback, researchers have proposed approaches such as CLARA and Refactory to construct feedback automatically. The key to such approaches is to fix a faulty program by making it equivalent to one of its correct reference programs whose overall structure is identical to that of the faulty submission. However, for a newly released assignment, it is likely that there are no correct reference programs at all, let alone correct reference programs sharing identical structure with the faulty submission. Therefore, in this paper, we propose AssignmentMender generating concise patches for newly released assignments. The key insight of AssignmentMender is that a faulty submission can be repaired by reusing fine-grained code snippets from submissions (even when they are faulty) for the same assignment. It automatically locates suspicious code in the faulty program and leverages static analysis to retrieve reference code from existing submissions with a graph-based matching algorithm. Finally, it generates candidate patches by modifying the suspicious code based on the reference code. Different from existing approaches, AssignmentMender exploits faulty submissions in addition to bug-free submissions to generate patches. Another advantage of AssignmentMender is that it can leverage submissions whose overall structures are different from those of the to-be-fixed submission. Evaluation results on 128 faulty submissions from 10 assignments show that AssignmentMender improves the state-of-the-art in feedback generation for newly released assignments. A case study involving 40 students and 80 submissions further provides initial evidence showing that the proposed approach is useful in practice.

关键词： Maintenance engineering Codes Task analysis programming profession Cloning Location awareness Syntactics Feedback generation program repair programming assignments

来源：评论

学校读者我要写书评

暂无评论

Strider: Signal Value Transition-Guided Defect Repair for HDL programming assignments

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2024年第5期43卷 1594-1607页

作者： Yang, Deheng He, Jiayu Mao, Xiaoguang Li, Tun Lei, Yan Yi, Xin Wu, Jiang Natl Univ Def Technol Coll Comp Key Lab Software Engn Complex Syst Changsha 410073 Peoples R China Natl Univ Def Technol Coll Comp Changsha 410073 Peoples R China Chongqing Univ Sch Big Data & Software Engn Chongqing 401331 Peoples R China

Hardware description languages (HDLs) are pivotal for the development of hardware designs. The programming courses for HDLs are also popular in both universities and online course platforms. Similar to programming assignments of software languages (SLs), these of HDLs also actively call for automated program repair (APR) techniques to provide personalized feedback for students. However, the research of APR techniques targeting HDL programming assignments is still in an early stage. Due to the significantly different programming mechanism of HDLs from SLs, the only APR technique (i.e., CirFix) targeting HDL programming assignments contributes a customized repair pipeline. However, the fundamental challenges in the design of HDL-oriented fault localization and patch generation still remain unresolved. In this work, we propose a signal value transition-guided defect repair technique named Strider by capturing the intrinsic features of HDLs. This technique consists of a time-aware dynamic defect localization approach to precisely localize defects, and a signal value transition-guided patch synthesis approach to effectively generate fixes. We further construct a dataset of 57 real defects from HDL programming assignments for tool evaluation. The evaluation reveals the overfitting issue of the pioneering tool CirFix and the significant improvement of Strider over CirFix in terms of both effectiveness and efficiency. In particular, Strider is more effective by correctly fixing 2.3x as many defects as CirFix in the real defect dataset, and is 23x more efficient by generating a correct fix within 5 min on average in the synthetic defect dataset, while CirFix takes around 2 h on average.

关键词： programming Hardware design languages Maintenance engineering Hardware Software Location awareness Timing Automated program repair (APR) fault localization hardware description languages (HDLs) programming assignments

来源：评论

学校读者我要写书评

暂无评论

Clustering source code from automated assessment of programming assignments

引用

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2024年 1-12页

作者： Paiva, Jose Carlos Leal, Jose Paulo Figueira, Alvaro INESC TEC CRACS Rua Campo Alegre P-4169007 Porto Portugal FCUP DCC Rua Campo Alegre P-4169007 Porto Portugal

Clustering of source code is a technique that can help improve feedback in automated program assessment. Grouping code submissions that contain similar mistakes can, for instance, facilitate the identification of students' difficulties to provide targeted feedback. Moreover, solutions with similar functionality but possibly different coding styles or progress levels can allow personalized feedback to students stuck at some point based on a more developed source code or even detect potential cases of plagiarism. However, existing clustering approaches for source code are mostly inadequate for automated feedback generation or assessment systems in programming education. They either give too much emphasis to syntactical program features, rely on expensive computations over pairs of programs, or require previously collected data. This paper introduces an online approach and implemented tool-AsanasCluster-to cluster source code submissions to programming assignments. The proposed approach relies on program attributes extracted from semantic graph representations of source code, including control and data flow features. The obtained feature vector values are fed into an incremental k-means model. Such a model aims to determine the closest cluster of solutions, as they enter the system, timely, considering clustering is an intermediate step for feedback generation in automated assessment. We have conducted a twofold evaluation of the tool to assess (1) its runtime performance and (2) its precision in separating different algorithmic strategies. To this end, we have applied our clustering approach on a public dataset of real submissions from undergraduate students to programming assignments, measuring the runtimes for the distinct tasks involved: building a model, identifying the closest cluster to a new observation, and recalculating partitions. As for the precision, we partition two groups of programs collected from GitHub. One group contains implementations of two search

关键词： programming learning Automated assessment programming assignments Clustering Semantic graph

来源：评论

学校读者我要写书评

暂无评论

programming assignments in Virtual Learning Environments: Developments and Opportunities for Engineering Education

引用

INTERNATIONAL JOURNAL OF ENGINEERING EDUCATION 2014年第3期30卷 644-653页

作者： Dagiene, Valentina Skupas, Bronius Kurilovas, Eugenijus Vilnius State Univ Inst Math & Informat LT-08663 Vilnius Lithuania Vilnius Gediminas Tech Univ LT-10223 Vilnius Lithuania

The aim of the paper is to present observations on automatic and semi-automatic assessment for programming assignments used in different e-learning contexts. Teaching of programming is an important part of different Informatics Engineering, Computer Science or Informatics, Computing, Information Technology and Communication courses in Universities and high schools. Students taking these courses have to demonstrate competences in problem solving and programming by creating working programs. Checking program validity is usually based on testing a program on diverse test cases. Testing for batch-type problems involves creating a set of input data cases, running a program submitted by a contestant with those input cases, analysing obtained outputs, etc. Assessment of programming assignments is as complex as testing of software systems. A lot of automatic assessment systems for programming assignments have been created to support teachers in submission assessment. However the problem of balance between the quality and the speed of assessment for programming assignments is important. Authors conducted the research on the possibilities of advanced semi-automatic approach in assessment, which can be used as compromise between manual and automatic assessment. A semi-automatic testing environment for evaluating programming assignments is developed, and the practical use of this system in Lithuania's optional programming maturity examination is presented. Presented research is useful for evaluating results of engineering education in general, and informatics/computer engineering education particularly.

关键词： engineering education programming assignments computer program assessment automatic and semi-automatic assessment personalised feedback virtual learning environment

来源：评论

学校读者我要写书评

暂无评论

Improving Effectiveness of programming assignments with Real-Time Formative Feedback 2023

Improving Effectiveness of Programming Assignments with Real...

引用

28th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE)

作者： Baimetov, Ilya Constructor Univ Schaffhausen Switzerland

ISBN: (纸本)9798400701399

This PhD research explores the problem of building a system for providing real-time formative feedback for programming assignments given to college/university students. Such system would maximize learning outcomes while minimizing the effort from the tutor to construct such system. We propose an approach to building such a system and assessing its effectiveness, as well as outlines topics for future research.

关键词： automated feedback programming assignments formative feedback ITS

来源：评论

学校读者我要写书评

暂无评论

GPT-3 vs Object Oriented programming assignments: An Experience Report 2023

GPT-3 vs Object Oriented Programming Assignments: An Experie...

引用

28th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE)

作者： Cipriano, Bruno Pereira Alves, Pedro Lusofona Univ COPELABS Lisbon Portugal

ISBN: (纸本)9798400701382

Recent studies show that AI-driven code generation tools, such as Large Language Models, are able to solve most of the problems usually presented in introductory programming classes. However, it is still unknown how they cope with Object Oriented programming assignments, where the students are asked to design and implement several interrelated classes (either by composition or inheritance) that follow a set of best-practices. Since the majority of the exercises in these tools' training dataset are written in English, it is also unclear how well they function with exercises published in other languages. In this paper, we report our experience using GPT-3 to solve 6 real-world tasks used in an Object Oriented programming course at a Portuguese University and written in Portuguese. Our observations, based on an objective evaluation of the code, performed by an open-source Automatic Assessment Tool, show that GPT-3 is able to interpret and handle direct functional requirements, however it tends not to give the best solution in terms of object oriented design. We perform a qualitative analysis of GPT-3's output, and gather a set of recommendations for computer science educators, since we expect students to use and abuse this tool in their academic work.

关键词： programming assignments teaching object oriented programming large language models gpt-3

来源：评论

学校读者我要写书评

暂无评论

Game-Themed programming assignments: The Faculty Perspective 08

Game-Themed Programming Assignments: The Faculty Perspective

引用

39th ACM Technical Symposium on Computer Science Education

作者： Sung, Kelvin Panitz, Michael Wallace, Scott Anderson, Ruth Nordlinger, John Univ Washington Bothell WA USA

ISBN: (纸本)9781595939470

We have designed and implemented game-themed programming assignment modules targeted specifically for adoption in existing introductory programming classes. These assignments are self-contained: so that faculty members with no background in graphics or gaming can selectively pick and choose a subset to combine with their own assignments in existing classes. This paper begins with a survey of previous results. Based on this survey, the paper summarizes the important considerations when designing materials for selective adoption. The paper then describes our design, implementation, and assessment efforts. Our result is a road map that guides faculty members in experimenting with game-themed programming assignments by incrementally adopting/customizing suitable materials for their classes.

关键词： CS1/2 Games programming assignments Adaptation

来源：评论

学校读者我要写书评

暂无评论

You Really Need Help: Exploring Expert Reasons for Intervention During Block-based programming assignments 2021

You Really Need Help: Exploring Expert Reasons for Intervent...

引用

17th Annual ACM Conference on International Computing Education Research (ICER)

作者： Dong, Yihuan Shabrina, Preya Marwan, Samiha Barnes, Tiffany North Carolina State Univ Raleigh NC 27695 USA

ISBN: (纸本)9781450383264

In recent years, research has increasingly focused on developing intelligent tutoring systems that provide data-driven support for students in need of assistance during programming assignments. One goal of such intelligent tutors is to provide students with quality interventions comparable to those human tutors would give. While most studies focused on generating different forms of on-demand support, such as next-step hints and worked examples, at any given moment during the programming assignment, there is a lack of research on why human tutors would provide different forms of proactive interventions to students in different situations. This information is critical to know to allow the intelligent programming environments to select the appropriate type of student support at the right moment. In this work, we studied human tutors' reasons for providing interventions during two introductory programming assignments in a block-based environment. Three human tutors evaluated a sample of 86 struggling moments identified from students' log data using a data-driven model. The human tutors specified whether and why an intervention was needed (or not) for each struggling moment. We analyzed the expert tags and their consensus discussions and extracted three main reasons that made the experts decide to intervene: "missing key components to make progress", "using wrong or unnecessary blocks", "misusing needed blocks", "having critical logic errors", "needing confirmation and next steps", and "unclear student intention". We use six case studies to illustrate specific student code trace examples and the tutors' reasons for intervention. We also discuss the potential types of automatic interventions that could address these cases. Our work sheds light on when and why students might need programming interventions. These insights contribute towards improving the quality of automated, data-driven support in programming learning environments.

关键词： novice programming proactive intervention block-based environments programming assignments expert intervention

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：