DocumentCode
615307
Title
Finding similar files using text mining
Author
Asanka, P. P. G. Dinesh
Author_Institution
Pearson Lanka CPvt) Ltd., Colombo, Sri Lanka
fYear
2013
fDate
26-28 April 2013
Firstpage
431
Lastpage
435
Abstract
Finding closely matching source codes are important in software development. By finding them, software architects will be able to identify similar implementation of classes, libraries etc. However, this is not an easy task, since there can be a large number of source code files. Manually matching each and every document may be difficult, if there is high number of documents. This research is to build a mechanism using term text mining methodology to find out similar documents from the given set of documents.
Keywords
data mining; software engineering; text analysis; closely-matched source code file determination; similar-document matching; similar-file determination; software development; term text mining methodology; Computers; Indexes; Libraries; Mechanical factors; Cosine Distance; Document Mapping; Inverse Document Frequency; Term Frequency; Text Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science & Education (ICCSE), 2013 8th International Conference on
Conference_Location
Colombo
Print_ISBN
978-1-4673-4464-7
Type
conf
DOI
10.1109/ICCSE.2013.6553950
Filename
6553950
Link To Document