Title :
Automatic Source Code Plagiarism Detection
Author :
Kustanto, Cynthia ; Liem, Inggriani
Author_Institution :
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
Abstract :
Plagiarism is one form of academic dishonesty, which is often done by students in programming classes. In a large class, detecting plagiarism manually is both difficult and time-consuming, especially due to the numerous modifications of the source code to conceal the cheating.We designed and developed Deimos, a prototype of a source code plagiarism detector, which can be extended to handle other programming languages, simply by implementing new scanners and parsers. Deimos works in two steps: (1) parsing source code and transforming it into tokens, and then (2) comparing each pair of token strings obtained in the first step using Running Karp-Rabin Greedy String Tiling algorithm. Instructor can access Deimos via a web application interface that receives input parameters, triggers a background process, and displays the result. The web interface offers user friendliness while the background process prevents timeout and reduces bandwidth consumption. This approach was chosen since Deimos is intended to be used for processing more than 100 source code. The web application was implemented using PHP, while Java was used to implement the backend application, which is responsible for the background process.Unit test, functional test, and nonfunctional test has been conducted. Detection time is 1 hour for processing 100 samples of beginner´s source code taken from real assignment of our programming class where the average length of source code is 150 lines. This code similarity detector could also be used for other pedagogical tools, such as autograder, which checks consistency of source code based on a template or solution.
Keywords :
authoring systems; educational administrative data processing; fraud; greedy algorithms; learning (artificial intelligence); program diagnostics; programming languages; Deimos; Java; PHP; automatic source code plagiarism detection; bandwidth consumption; functional test; input parameters; instructor; nonfunctional test; parser; programming classes; programming languages; running Karp-Rabin greedy string tiling algorithm; scanner; template; token strings; unit test; web application interface; Artificial intelligence; Automatic programming; Detectors; Distributed computing; Informatics; Intelligent networks; Parallel programming; Plagiarism; Software engineering; Testing; plagiarism detection; source code plagiarism;
Conference_Titel :
Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009. SNPD '09. 10th ACIS International Conference on
Conference_Location :
Daegu
Print_ISBN :
978-0-7695-3642-2
DOI :
10.1109/SNPD.2009.62