DocumentCode
2874271
Title
Density-Based Clustering of Massive Short Messages Using Domain Ontology
Author
Yang, Shenghong ; Wang, Yongheng
Author_Institution
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Volume
2
fYear
2009
fDate
18-19 July 2009
Firstpage
505
Lastpage
508
Abstract
With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short messages such as emails in companies or conversations in open chatting rooms. It is useful to find the themes or exceptional information from the messages by clustering the short documents based on density. However, traditional vector space model based text clustering algorithms can not get acceptable accuracy because the key words appear at low frequency. On the other hand, traditional text clustering algorithms become very inefficient or even unavailable when processing massive data at TB level. In this paper a density-based short message clustering algorithm using domain ontology based is presented. This algorithm uses domain ontology to calculate the semantic similarity between short messages which improves the parallel method is also used to get better scalability.
Keywords
data mining; ontologies (artificial intelligence); parallel processing; pattern clustering; text analysis; TB level; density-based document clustering; domain ontology; information technology; massive data processing; massive short message; parallel mining; semantic similarity; text clustering; vector space model; Clustering algorithms; Clustering methods; Computer science; Databases; Frequency; Information processing; Ontologies; Optical noise; Partitioning algorithms; Scalability; density; domain ontology; massive; short messages; text clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on
Conference_Location
Shenzhen
Print_ISBN
978-0-7695-3699-6
Type
conf
DOI
10.1109/APCIP.2009.260
Filename
5197247
Link To Document