DocumentCode :
711884
Title :
Search Results Clustering Algorithm Based on the Suffix Tree
Author :
Dengwei Wang ; Libo Liu ; Jing Dong ; Jiao Zheng
Author_Institution :
Sch. of Math. & Comput. Sci., Ningxia Univ., Yinchuan, China
fYear :
2015
fDate :
24-26 April 2015
Firstpage :
456
Lastpage :
460
Abstract :
The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the screening of the clustering labels, the paper improves STC algorithm, respectively perfecting the choice of the base cluster, the similarity calculation formula used to merge the base clusters and the scoring function for the clustering labels. Finally entropy is taken as the evaluation criterion for the clustering results. Compared with the original algorithm there are a better effect which is attested by experiments and more readability, descriptive and distinguishable clustering labels.
Keywords :
computational complexity; document handling; information retrieval; pattern clustering; trees (mathematics); STC algorithm; base cluster merging; clustering labels; descriptive clustering labels; distinguishable clustering labels; entropy; linear time algorithm; scoring function; search result clustering algorithm; shared phrases; similarity calculation formula; suffix tree; Algorithm design and analysis; Bismuth; Clustering algorithms; Data mining; Entropy; Mathematical model; Search engines; clustering algorithm; document clustering; search result clustering; suffix tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Control Engineering (ICISCE), 2015 2nd International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-6849-0
Type :
conf
DOI :
10.1109/ICISCE.2015.106
Filename :
7120646
Link To Document :
بازگشت