• DocumentCode
    2452746
  • Title

    An accurate estimation of the Levenshtein distance using metric trees and Manhattan distance

  • Author

    Lavoie, Thierry ; Merlo, Ettore

  • Author_Institution
    Dept. de genie Inf. et logiciel, Ecole Polytech. de Montreal, Montreal, QC, Canada
  • fYear
    2012
  • fDate
    4-4 June 2012
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    This paper presents an original clone detection technique which is an accurate approximation of the Levenshtein distance. It uses groups of tokens extracted from source code called windowed-tokens. From these, frequency vectors are then constructed and compared with the Manhattan distance in a metric tree. The goal of this new technique is to provide a very high precision clone detection technique while keeping a high recall. Precision and recall measurement is done with respect to the Levenshtein distance. The testbench is a large scale open source software. The collected results proved the technique to be fast, simple, and accurate. Finally, this article presents further research opportunities.
  • Keywords
    estimation theory; program compilers; program testing; public domain software; trees (mathematics); Levenshtein distance; Manhattan distance; accurate approximation; accurate estimation; frequency vectors; large scale open source software; metric trees; original clone detection technique; precision clone detection technique; precision measurement; recall measurement; research opportunity; source code; testbench; windowed-tokens; Algorithm design and analysis; Cloning; Measurement; Software; Software algorithms; Syntactics; Vectors; Clone detection; Levenshtein distance; Manhattan distance; Software clones;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Clones (IWSC), 2012 6th International Workshop on
  • Conference_Location
    Zurich
  • Print_ISBN
    978-1-4673-1794-8
  • Type

    conf

  • DOI
    10.1109/IWSC.2012.6227861
  • Filename
    6227861