DocumentCode :
2874271
Title :
Density-Based Clustering of Massive Short Messages Using Domain Ontology
Author :
Yang, Shenghong ; Wang, Yongheng
Author_Institution :
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Volume :
2
fYear :
2009
fDate :
18-19 July 2009
Firstpage :
505
Lastpage :
508
Abstract :
With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short messages such as emails in companies or conversations in open chatting rooms. It is useful to find the themes or exceptional information from the messages by clustering the short documents based on density. However, traditional vector space model based text clustering algorithms can not get acceptable accuracy because the key words appear at low frequency. On the other hand, traditional text clustering algorithms become very inefficient or even unavailable when processing massive data at TB level. In this paper a density-based short message clustering algorithm using domain ontology based is presented. This algorithm uses domain ontology to calculate the semantic similarity between short messages which improves the parallel method is also used to get better scalability.
Keywords :
data mining; ontologies (artificial intelligence); parallel processing; pattern clustering; text analysis; TB level; density-based document clustering; domain ontology; information technology; massive data processing; massive short message; parallel mining; semantic similarity; text clustering; vector space model; Clustering algorithms; Clustering methods; Computer science; Databases; Frequency; Information processing; Ontologies; Optical noise; Partitioning algorithms; Scalability; density; domain ontology; massive; short messages; text clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-0-7695-3699-6
Type :
conf
DOI :
10.1109/APCIP.2009.260
Filename :
5197247
Link To Document :
بازگشت