Automatic Grading of Computer Programs: A Machine Learning Approach

Author

Srikant, Shashank ; Aggarwal, Vaneet

Volume

fYear

2013

Firstpage

Lastpage

Abstract

The automatic evaluation of computer programs is a nascent area of research with a potential for large-scale impact. Extant program assessment systems score mostly based on the number of test-cases passed, providing no insight into the competency of the programmer. In this paper, we present a machine learning framework to automatically grade computer programs. We propose a set of highly-informative features, derived from the abstract representations of a given program, that capture the program´s functionality. These features are then used to learn a model to grade the programs, which are built against evaluations done by experts on the basis of a rubric. We show that regression modeling based on the given features provide much better grading than the ubiquitous test-case-pass based grading and rivals the grading accuracy of other open-response problems such as essay grading. We also show that our novel features add significant value over and above basic keyword/expression count features. In addition to this, we propose a novel way of posing computer-program grading as a one-class modeling problem. Our preliminary investigations in the same show promising results and suggest an implicit correlation of our features with the proposed grading-levels (rubric). To the best of the authors´ knowledge, this is the first time machine learning has been applied to the problem of grading programs. The work is timely with regard to the recent boom in Massively Online Open Courseware (MOOCs), which promises to produce a significant amount of hand-graded digitized data.

Keywords

courseware; learning (artificial intelligence); program testing; regression analysis; ubiquitous computing; MOOC; abstract representations; automatic computer program evaluation; automatic computer-program grading; expression count features; extant program assessment system score; hand-graded digitized data; highly-informative features; keyword count features; machine learning framework; massively online open courseware; one-class modeling problem; open-response problems; program functionality; regression modeling; ubiquitous test-case-pass based grading; Abstracts; Computers; Context; Feature extraction; Grammar; Measurement; Programming; Abstract Syntax Trees; Assessment of Computer Programs; MOOC; One-Class Modeling; Regression; Rubric;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Applications (ICMLA), 2013 12th International Conference on

Conference_Location

Miami, FL

Type

conf

DOI

10.1109/ICMLA.2013.22

Filename

6784592

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1733673