• DocumentCode
    2874271
  • Title

    Density-Based Clustering of Massive Short Messages Using Domain Ontology

  • Author

    Yang, Shenghong ; Wang, Yongheng

  • Author_Institution
    Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
  • Volume
    2
  • fYear
    2009
  • fDate
    18-19 July 2009
  • Firstpage
    505
  • Lastpage
    508
  • Abstract
    With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short messages such as emails in companies or conversations in open chatting rooms. It is useful to find the themes or exceptional information from the messages by clustering the short documents based on density. However, traditional vector space model based text clustering algorithms can not get acceptable accuracy because the key words appear at low frequency. On the other hand, traditional text clustering algorithms become very inefficient or even unavailable when processing massive data at TB level. In this paper a density-based short message clustering algorithm using domain ontology based is presented. This algorithm uses domain ontology to calculate the semantic similarity between short messages which improves the parallel method is also used to get better scalability.
  • Keywords
    data mining; ontologies (artificial intelligence); parallel processing; pattern clustering; text analysis; TB level; density-based document clustering; domain ontology; information technology; massive data processing; massive short message; parallel mining; semantic similarity; text clustering; vector space model; Clustering algorithms; Clustering methods; Computer science; Databases; Frequency; Information processing; Ontologies; Optical noise; Partitioning algorithms; Scalability; density; domain ontology; massive; short messages; text clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-0-7695-3699-6
  • Type

    conf

  • DOI
    10.1109/APCIP.2009.260
  • Filename
    5197247