Finding similar files using text mining

Author

Asanka, P. P. G. Dinesh

Author_Institution

Pearson Lanka CPvt) Ltd., Colombo, Sri Lanka

fYear

2013

fDate

26-28 April 2013

Firstpage

431

Lastpage

435

Abstract

Finding closely matching source codes are important in software development. By finding them, software architects will be able to identify similar implementation of classes, libraries etc. However, this is not an easy task, since there can be a large number of source code files. Manually matching each and every document may be difficult, if there is high number of documents. This research is to build a mechanism using term text mining methodology to find out similar documents from the given set of documents.

Keywords

data mining; software engineering; text analysis; closely-matched source code file determination; similar-document matching; similar-file determination; software development; term text mining methodology; Computers; Indexes; Libraries; Mechanical factors; Cosine Distance; Document Mapping; Inverse Document Frequency; Term Frequency; Text Mining;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science & Education (ICCSE), 2013 8th International Conference on

Conference_Location

Colombo

Print_ISBN

978-1-4673-4464-7

Type

conf

DOI

10.1109/ICCSE.2013.6553950

Filename

6553950

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=615307