Title :
Incremental clustering in short text streams based on BM25
Author :
Lixin Xu ; Guang Chen ; Lei Yang
Author_Institution :
Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Since short text is short of keywords and has sparse features, it brings about the similarity drift problem. The traditional clustering algorithms are usually ineffective and a waste of resources on dealing with short text stream. To overcome the above problems, this paper proposes an incremental clustering algorithm in short text streams based on BM25. The approach makes full use of BM25 to extract keywords and weights of each cluster, and applies extracted parameters to similarity calculation. Theoretical analysis and experiments show that the proposed incremental clustering algorithm solves the similarity drift problem well and achieves satisfactory accuracy and performance in terms of short text stream clustering, compared with the traditional clustering algorithms.
Keywords :
pattern clustering; text analysis; BM25; cluster weight extraction; incremental clustering algorithm; keyword extraction; short-text stream clustering; similarity calculation; similarity drift problem; Clustering algorithms; Computers; Frequency modulation; Gravity; Lead; Security; BM25; Cluster cohesion; Incremental clustering; Keyword similarity; Short text stream;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN :
978-1-4799-4720-1
DOI :
10.1109/CCIS.2014.7175694