Title :
Finding similar files using text mining
Author :
Asanka, P. P. G. Dinesh
Author_Institution :
Pearson Lanka CPvt) Ltd., Colombo, Sri Lanka
Abstract :
Finding closely matching source codes are important in software development. By finding them, software architects will be able to identify similar implementation of classes, libraries etc. However, this is not an easy task, since there can be a large number of source code files. Manually matching each and every document may be difficult, if there is high number of documents. This research is to build a mechanism using term text mining methodology to find out similar documents from the given set of documents.
Keywords :
data mining; software engineering; text analysis; closely-matched source code file determination; similar-document matching; similar-file determination; software development; term text mining methodology; Computers; Indexes; Libraries; Mechanical factors; Cosine Distance; Document Mapping; Inverse Document Frequency; Term Frequency; Text Mining;
Conference_Titel :
Computer Science & Education (ICCSE), 2013 8th International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4673-4464-7
DOI :
10.1109/ICCSE.2013.6553950