DocumentCode :
248401
Title :
An Approach for Document Pre-processing and K Means Algorithm Implementation
Author :
Gowtham, S. ; Goswami, Mausumi ; Balachandran, Krishna ; Purkayastha, B.S.
Author_Institution :
Fac. of Eng., Christ Univ., Bangalore, India
fYear :
2014
fDate :
27-29 Aug. 2014
Firstpage :
162
Lastpage :
166
Abstract :
The web mining is a cutting edge technology, which includes information gathering and classification of information over web. This paper puts forth the concepts of document pre-processing, which is achieved by extraction of keywords from the documents fetched from the web, processing it and generating a term-document matrix, TF-IDF and the different approaches of TF-IDF (term frequency Inverse document frequency) for each respective document. The last step is the clustering of these results through K Means algorithm, by comparing the performance of each approach used. The algorithm is realized on an X64 architecture and coded on Java and Matlab platform. The results are tabulated.
Keywords :
Internet; Java; classification; data mining; document handling; pattern clustering; Java; Matlab platform; TF-IDF; Web mining; World Wide Web; X64 architecture; cutting edge technology; document preprocessing; information classification; information gathering; k means algorithm implementation; term frequency inverse document frequency; term-document matrix; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Data mining; Information retrieval; Java; MATLAB; K Means clustering; Stop words; augmented; frequency; logarithmic; stemming; term-document matrix; tf-idf;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Computing and Communications (ICACC), 2014 Fourth International Conference on
Conference_Location :
Cochin
Print_ISBN :
978-1-4799-4364-7
Type :
conf
DOI :
10.1109/ICACC.2014.46
Filename :
6906015
Link To Document :
بازگشت