DocumentCode
2684237
Title
Latent Ontological Feature Discovery for Text Clustering
Author
Duong, V.T.T. ; Cao, Tru H. ; Chau, Cuong K. ; Quan, Tho T.
Author_Institution
Fac. of Inf. Technol. & Appl. Math., Ton Duc Thang Univ., Ho Chi Minh City, Vietnam
fYear
2009
fDate
13-17 July 2009
Firstpage
1
Lastpage
8
Abstract
The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual appearance. In this paper, we explore weighted combinations of those latent named entity features with keywords for text clustering. To that end, the traditional vector space model is adapted with multiple vectors defined over spaces of entity names, types, name-type pairs, identifiers, and keywords. Clustering quality is evaluated by both of the self purity-separation type and the relative comparison type of measures. Hard and fuzzy clustering experiments of the proposed model on selected data subsets of Reuters-21578 are conducted and evaluated.
Keywords
fuzzy set theory; pattern clustering; text analysis; Reuters-21578; clustering quality; fuzzy clustering; hard clustering; keyword; latent named entity feature; latent ontological feature discovery; news article; relative comparison type; self purity-separation type; semantics; text clustering; vector space model; Cities and towns; Clustering algorithms; Computer science; Entropy; Information retrieval; Information technology; Labeling; Mathematics; Ontologies; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Computing and Communication Technologies, 2009. RIVF '09. International Conference on
Conference_Location
Da Nang
Print_ISBN
978-1-4244-4566-0
Electronic_ISBN
978-1-4244-4568-4
Type
conf
DOI
10.1109/RIVF.2009.5174647
Filename
5174647
Link To Document