Title :
An accurate estimation of the Levenshtein distance using metric trees and Manhattan distance
Author :
Lavoie, Thierry ; Merlo, Ettore
Author_Institution :
Dept. de genie Inf. et logiciel, Ecole Polytech. de Montreal, Montreal, QC, Canada
Abstract :
This paper presents an original clone detection technique which is an accurate approximation of the Levenshtein distance. It uses groups of tokens extracted from source code called windowed-tokens. From these, frequency vectors are then constructed and compared with the Manhattan distance in a metric tree. The goal of this new technique is to provide a very high precision clone detection technique while keeping a high recall. Precision and recall measurement is done with respect to the Levenshtein distance. The testbench is a large scale open source software. The collected results proved the technique to be fast, simple, and accurate. Finally, this article presents further research opportunities.
Keywords :
estimation theory; program compilers; program testing; public domain software; trees (mathematics); Levenshtein distance; Manhattan distance; accurate approximation; accurate estimation; frequency vectors; large scale open source software; metric trees; original clone detection technique; precision clone detection technique; precision measurement; recall measurement; research opportunity; source code; testbench; windowed-tokens; Algorithm design and analysis; Cloning; Measurement; Software; Software algorithms; Syntactics; Vectors; Clone detection; Levenshtein distance; Manhattan distance; Software clones;
Conference_Titel :
Software Clones (IWSC), 2012 6th International Workshop on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1794-8
DOI :
10.1109/IWSC.2012.6227861