This paper studies recent developments in large language models' (LLM) abilities to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. The emergence of ChatGPT...
详细信息
ISBN:
(纸本)9781450399760
This paper studies recent developments in large language models' (LLM) abilities to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. The emergence of ChatGPT resulted in heated debates of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming classes (e.g., cheating). Recent studies show that while the technology performs surprisingly well on diverse sets of assessment instruments employed in typical programming classes the performance is usually not sufficient to pass the courses. The release of GPT-4 largely emphasized notable improvements in the capabilities related to handling assessments originally designed for human test-takers. This study is the necessary analysis in the context of this ongoing transition towards mature generative AI systems. Specifically, we report the performance of GPT-4, comparing it to the previous generations of GPT models, on three Python courses with assessments ranging from simple multiple-choice questions (no code involved) to complex programming projects with code bases distributed into multiple files (599 exercises overall). Additionally, we analyze the assessments that were not handled well by GPT-4 to understand the current limitations of the model, as well as its capabilities to leverage feedback provided by an auto-grader. We found that the GPT models evolved from completely failing the typical programming class' assessments (the original GPT-3) to confidently passing the courses with no human involvement (GPT-4). While we identified certain limitations in GPT-4's handling of MCQs and coding exercises, the rate of improvement across the recent generations of GPT models strongly suggests their potential to handle almost any type of assessment widely used in higher education programming courses. These findings could be leveraged by educators and institutions to adapt the design of programming assessments as well as to fuel the nec
The success of technology enhanced learning can be increased by tailoring the content and the learning resources for every student;thus, optimizing the learning process. This study proposes a method for evaluating con...
详细信息
ISBN:
(纸本)9781728134857
The success of technology enhanced learning can be increased by tailoring the content and the learning resources for every student;thus, optimizing the learning process. This study proposes a method for evaluating content difficulty and knowledge proficiency of users based on modified Elo-rating algorithm. The calculated ratings are used further in the teaching process as a recommendation of coding exercises that try to match the user's current knowledge. The proposed method was tested with a programming tutoring system in object-oriented programming course. The results showed positive findings regarding the effectiveness of the implemented Elo-rating algorithm in recommending coding exercises, as a proof-of-concept for developing adaptive and automatic assessment of programming assignments.
暂无评论