Density-Based Clustering of Massive Short Messages Using Domain Ontology

Author

Yang, Shenghong ; Wang, Yongheng

Author_Institution

Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China

Volume

2

fYear

2009

fDate

18-19 July 2009

Firstpage

505

Lastpage

508

Abstract

With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short messages such as emails in companies or conversations in open chatting rooms. It is useful to find the themes or exceptional information from the messages by clustering the short documents based on density. However, traditional vector space model based text clustering algorithms can not get acceptable accuracy because the key words appear at low frequency. On the other hand, traditional text clustering algorithms become very inefficient or even unavailable when processing massive data at TB level. In this paper a density-based short message clustering algorithm using domain ontology based is presented. This algorithm uses domain ontology to calculate the semantic similarity between short messages which improves the parallel method is also used to get better scalability.

Keywords

data mining; ontologies (artificial intelligence); parallel processing; pattern clustering; text analysis; TB level; density-based document clustering; domain ontology; information technology; massive data processing; massive short message; parallel mining; semantic similarity; text clustering; vector space model; Clustering algorithms; Clustering methods; Computer science; Databases; Frequency; Information processing; Ontologies; Optical noise; Partitioning algorithms; Scalability; density; domain ontology; massive; short messages; text clustering;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on

Conference_Location

Shenzhen

Print_ISBN

978-0-7695-3699-6

Type

conf

DOI

10.1109/APCIP.2009.260

Filename

5197247