• DocumentCode
    615307
  • Title

    Finding similar files using text mining

  • Author

    Asanka, P. P. G. Dinesh

  • Author_Institution
    Pearson Lanka CPvt) Ltd., Colombo, Sri Lanka
  • fYear
    2013
  • fDate
    26-28 April 2013
  • Firstpage
    431
  • Lastpage
    435
  • Abstract
    Finding closely matching source codes are important in software development. By finding them, software architects will be able to identify similar implementation of classes, libraries etc. However, this is not an easy task, since there can be a large number of source code files. Manually matching each and every document may be difficult, if there is high number of documents. This research is to build a mechanism using term text mining methodology to find out similar documents from the given set of documents.
  • Keywords
    data mining; software engineering; text analysis; closely-matched source code file determination; similar-document matching; similar-file determination; software development; term text mining methodology; Computers; Indexes; Libraries; Mechanical factors; Cosine Distance; Document Mapping; Inverse Document Frequency; Term Frequency; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Education (ICCSE), 2013 8th International Conference on
  • Conference_Location
    Colombo
  • Print_ISBN
    978-1-4673-4464-7
  • Type

    conf

  • DOI
    10.1109/ICCSE.2013.6553950
  • Filename
    6553950