Title :
Latent Ontological Feature Discovery for Text Clustering
Author :
Duong, V.T.T. ; Cao, Tru H. ; Chau, Cuong K. ; Quan, Tho T.
Author_Institution :
Fac. of Inf. Technol. & Appl. Math., Ton Duc Thang Univ., Ho Chi Minh City, Vietnam
Abstract :
The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual appearance. In this paper, we explore weighted combinations of those latent named entity features with keywords for text clustering. To that end, the traditional vector space model is adapted with multiple vectors defined over spaces of entity names, types, name-type pairs, identifiers, and keywords. Clustering quality is evaluated by both of the self purity-separation type and the relative comparison type of measures. Hard and fuzzy clustering experiments of the proposed model on selected data subsets of Reuters-21578 are conducted and evaluated.
Keywords :
fuzzy set theory; pattern clustering; text analysis; Reuters-21578; clustering quality; fuzzy clustering; hard clustering; keyword; latent named entity feature; latent ontological feature discovery; news article; relative comparison type; self purity-separation type; semantics; text clustering; vector space model; Cities and towns; Clustering algorithms; Computer science; Entropy; Information retrieval; Information technology; Labeling; Mathematics; Ontologies; Vectors;
Conference_Titel :
Computing and Communication Technologies, 2009. RIVF '09. International Conference on
Conference_Location :
Da Nang
Print_ISBN :
978-1-4244-4566-0
Electronic_ISBN :
978-1-4244-4568-4
DOI :
10.1109/RIVF.2009.5174647