DocumentCode
248401
Title
An Approach for Document Pre-processing and K Means Algorithm Implementation
Author
Gowtham, S. ; Goswami, Mausumi ; Balachandran, Krishna ; Purkayastha, B.S.
Author_Institution
Fac. of Eng., Christ Univ., Bangalore, India
fYear
2014
fDate
27-29 Aug. 2014
Firstpage
162
Lastpage
166
Abstract
The web mining is a cutting edge technology, which includes information gathering and classification of information over web. This paper puts forth the concepts of document pre-processing, which is achieved by extraction of keywords from the documents fetched from the web, processing it and generating a term-document matrix, TF-IDF and the different approaches of TF-IDF (term frequency Inverse document frequency) for each respective document. The last step is the clustering of these results through K Means algorithm, by comparing the performance of each approach used. The algorithm is realized on an X64 architecture and coded on Java and Matlab platform. The results are tabulated.
Keywords
Internet; Java; classification; data mining; document handling; pattern clustering; Java; Matlab platform; TF-IDF; Web mining; World Wide Web; X64 architecture; cutting edge technology; document preprocessing; information classification; information gathering; k means algorithm implementation; term frequency inverse document frequency; term-document matrix; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Data mining; Information retrieval; Java; MATLAB; K Means clustering; Stop words; augmented; frequency; logarithmic; stemming; term-document matrix; tf-idf;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computing and Communications (ICACC), 2014 Fourth International Conference on
Conference_Location
Cochin
Print_ISBN
978-1-4799-4364-7
Type
conf
DOI
10.1109/ICACC.2014.46
Filename
6906015
Link To Document