DocumentCode :
1931603
Title :
Text document clustering based on frequent concepts
Author :
Baghel, Rekha ; Dhir, Renu
Author_Institution :
Dept. of Comput. Sci. & Eng., Dr. B. R. Ambedkar Nat. Inst. of Technol., Jalandhar, India
fYear :
2010
fDate :
28-30 Oct. 2010
Firstpage :
366
Lastpage :
371
Abstract :
This paper presents a novel technique of document clustering based on frequent concepts. The proposed FCDC (Frequent Concepts based Document Clustering), a clustering algorithm works with frequent concepts rather than frequent itemsets used in traditional text mining techniques. Many well known clustering algorithms deal with documents as bag of words while they ignore the important relationship between words like synonym relationship. The proposed algorithm utilizes the semantic relationship between words to create concepts. It exploits the WordNet ontology in turn to create low dimensional feature vector which allows developing a more accurate clustering algorithm.
Keywords :
data mining; ontologies (artificial intelligence); pattern clustering; text analysis; WordNet; document clustering; feature vector; frequent concept; ontology; text mining; Accuracy; Clustering algorithms; Databases; Grid computing; Merging; Ontologies; Wireless application protocol;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Distributed and Grid Computing (PDGC), 2010 1st International Conference on
Conference_Location :
Solan
Print_ISBN :
978-1-4244-7675-6
Type :
conf
DOI :
10.1109/PDGC.2010.5679969
Filename :
5679969
Link To Document :
بازگشت