• DocumentCode
    175562
  • Title

    Identifying Source Code Reuse across Repositories Using LCS-Based Source Code Similarity

  • Author

    Kawamitsu, Naohiro ; Ishio, Takashi ; Kanda, Takefumi ; Kula, Raula Gaikovina ; De Roover, Coen ; Inoue, Ken

  • Author_Institution
    Grad. Sch. of Inf. Sci. & Technol., Osaka Univ., Suita, Japan
  • fYear
    2014
  • fDate
    28-29 Sept. 2014
  • Firstpage
    305
  • Lastpage
    314
  • Abstract
    Developers often reuse source files developed for another project. In order to update a reused file to a newer version released by the original project, developers have to track which revision of a file was reused and how its content was modified. However, such tracking is tedious for developers. Many projects keep older versions of files whose bugs are already fixed in the original project. In this paper, we propose a technique to automatically identify source code reuse relationships between two repositories. Using a similarity metric based on longest common subsequence, we identify pairs of similar revisions of files across the repositories. To evaluate our approach, we have analyzed eight project pairs of open source software projects and compared the result with the recorded information in the repositories. As a result, we have identified 1394 file revisions as instances of source code reuse. While 75.3% of the instances are recorded in the repositories, 20.1% of the instances are unrecorded but recovered by our approach.
  • Keywords
    public domain software; software metrics; software reusability; source code (software); LCS-based source code similarity; automatic source code reuse relationship identification; longest common subsequence; open source software projects; pair identification; similarity metric; source file reuse; Educational institutions; History; Libraries; Measurement; Particle separators; Software; White spaces; empirical study; software reuse; source code similarity; version control system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Source Code Analysis and Manipulation (SCAM), 2014 IEEE 14th International Working Conference on
  • Conference_Location
    Victoria, BC
  • Type

    conf

  • DOI
    10.1109/SCAM.2014.17
  • Filename
    6975664