DocumentCode :
2452746
Title :
An accurate estimation of the Levenshtein distance using metric trees and Manhattan distance
Author :
Lavoie, Thierry ; Merlo, Ettore
Author_Institution :
Dept. de genie Inf. et logiciel, Ecole Polytech. de Montreal, Montreal, QC, Canada
fYear :
2012
fDate :
4-4 June 2012
Firstpage :
1
Lastpage :
7
Abstract :
This paper presents an original clone detection technique which is an accurate approximation of the Levenshtein distance. It uses groups of tokens extracted from source code called windowed-tokens. From these, frequency vectors are then constructed and compared with the Manhattan distance in a metric tree. The goal of this new technique is to provide a very high precision clone detection technique while keeping a high recall. Precision and recall measurement is done with respect to the Levenshtein distance. The testbench is a large scale open source software. The collected results proved the technique to be fast, simple, and accurate. Finally, this article presents further research opportunities.
Keywords :
estimation theory; program compilers; program testing; public domain software; trees (mathematics); Levenshtein distance; Manhattan distance; accurate approximation; accurate estimation; frequency vectors; large scale open source software; metric trees; original clone detection technique; precision clone detection technique; precision measurement; recall measurement; research opportunity; source code; testbench; windowed-tokens; Algorithm design and analysis; Cloning; Measurement; Software; Software algorithms; Syntactics; Vectors; Clone detection; Levenshtein distance; Manhattan distance; Software clones;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Clones (IWSC), 2012 6th International Workshop on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1794-8
Type :
conf
DOI :
10.1109/IWSC.2012.6227861
Filename :
6227861
Link To Document :
بازگشت