DocumentCode
3190006
Title
GDClust: A Graph-Based Document Clustering Technique
Author
Hossain, M. Shahriar ; Angryk, Rafal A.
Author_Institution
Montana State Univ., Bozeman
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
417
Lastpage
422
Abstract
This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (graph-based document clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.
Keywords
Gaussian processes; data mining; graph theory; natural language processing; ontologies (artificial intelligence); pattern clustering; text analysis; English language ontology; apriori paradigm; candidate subgraph generation; frequent senses; frequent subgraphs; graph-based document clustering technique; multilevel Gaussian minimum support approach; sense discovery; text mining techniques; Association rules; Books; Chemical analysis; Chemical technology; Clustering algorithms; Computer science; Conferences; Data mining; Humans; Ontologies;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location
Omaha, NE
Print_ISBN
978-0-7695-3019-2
Electronic_ISBN
978-0-7695-3033-8
Type
conf
DOI
10.1109/ICDMW.2007.104
Filename
4476701
Link To Document