• DocumentCode
    184816
  • Title

    One pass preprocessing for token-based source code clone detection

  • Author

    Dingkun Li ; Minghao Piao ; Ho Sun Shon ; Keun Ho Ryu ; Incheon Paik

  • Author_Institution
    Dotabase/Bioinf. Lab., Chungbuk Nat. Univ. Cheongju, Cheongju, South Korea
  • fYear
    2014
  • fDate
    29-31 Oct. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Token-based source code clones detection provides a promising way to detect the source code duplication and re-dundancy. While preprocessing of clone detection plays an important role in KDD for further processing as the old saying goes: well begun is half done. However, processing unstructured source code files of large software systems is really challenging and time or space consuming. This paper introduces a novel way to clean, tokenize and transform the source code into the appropriate form for mining. A tool called OPP (One Pass Preprocessor) has been developed to preprocess the source code files efficiently and flexibly. The paper experimented on three large open source projects like Wildfly1.02 Linux core-3.6, VTK of different host languages, and the result showed that our tool has great power and flexibility to preprocess the source code files and products high quality output.
  • Keywords
    public domain software; redundancy; software reliability; source code (software); KDD; OPP; large software systems; one pass preprocessing; one pass preprocessor; open source projects; products high quality output; redundancy; source code duplication; space consuming; time consuming; token-based source code clone detection; tokenize; transform; unstructured source code files; Cleaning; Cloning; Data mining; Java; Layout; Software systems; Transforms; KDD; clone detection; code clone; configuration table; tokenization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Awareness Science and Technology (iCAST), 2014 IEEE 6th International Conference on
  • Conference_Location
    Paris
  • Type

    conf

  • DOI
    10.1109/ICAwST.2014.6981824
  • Filename
    6981824