• DocumentCode
    2684237
  • Title

    Latent Ontological Feature Discovery for Text Clustering

  • Author

    Duong, V.T.T. ; Cao, Tru H. ; Chau, Cuong K. ; Quan, Tho T.

  • Author_Institution
    Fac. of Inf. Technol. & Appl. Math., Ton Duc Thang Univ., Ho Chi Minh City, Vietnam
  • fYear
    2009
  • fDate
    13-17 July 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual appearance. In this paper, we explore weighted combinations of those latent named entity features with keywords for text clustering. To that end, the traditional vector space model is adapted with multiple vectors defined over spaces of entity names, types, name-type pairs, identifiers, and keywords. Clustering quality is evaluated by both of the self purity-separation type and the relative comparison type of measures. Hard and fuzzy clustering experiments of the proposed model on selected data subsets of Reuters-21578 are conducted and evaluated.
  • Keywords
    fuzzy set theory; pattern clustering; text analysis; Reuters-21578; clustering quality; fuzzy clustering; hard clustering; keyword; latent named entity feature; latent ontological feature discovery; news article; relative comparison type; self purity-separation type; semantics; text clustering; vector space model; Cities and towns; Clustering algorithms; Computer science; Entropy; Information retrieval; Information technology; Labeling; Mathematics; Ontologies; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Communication Technologies, 2009. RIVF '09. International Conference on
  • Conference_Location
    Da Nang
  • Print_ISBN
    978-1-4244-4566-0
  • Electronic_ISBN
    978-1-4244-4568-4
  • Type

    conf

  • DOI
    10.1109/RIVF.2009.5174647
  • Filename
    5174647