DocumentCode :
2672180
Title :
Search Results Clustering Based on Suffix Array and VSM
Author :
Bai, Shunlai ; Zhu, Wenhao ; Zhang, Bofeng ; Ma, Jianhua
Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
fYear :
2010
fDate :
18-20 Dec. 2010
Firstpage :
852
Lastpage :
857
Abstract :
With the rapid growth of web pages, search engines will usually present a long ranked list of documents. The users must sift through the list with "title" and "snippet" (a short description of the document) to find the desired document. This method may be good for some simple and specific tasks but less effective and efficient for ambiguous queries such as "apple" or "jaguar". To improve the effect and efficiency of information retrieval, an alternative method is to automatically organize retrieval results into clusters. This paper presents an improved Lingo algorithm named Suffix Array Similarity Clustering (SASC) for clustering web search results. This method creates the clusters by adopting improved suffix array, which ignores the redundant suffixes, and computing document similarity based on the title and short document snippets returned by Web search engines. Experiments show that the SASC algorithm has not only a better performance in time-consuming than Lingo but also in cluster description quality and precision than Suffix Tree Clustering.
Keywords :
Internet; information retrieval; pattern clustering; search engines; Lingo algorithm; SASC; VSM; Web search engines; ambiguous queries; clustering web search; information retrieval; search results clustering; suffix array; suffix array similarity clustering; Algorithm design and analysis; Arrays; Clustering algorithms; Information retrieval; Matrix decomposition; Software; Software algorithms; Lingo; STC; Suffix Array; Suffix Tree; search results clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom)
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-9779-9
Electronic_ISBN :
978-0-7695-4331-4
Type :
conf
DOI :
10.1109/GreenCom-CPSCom.2010.107
Filename :
5724930
Link To Document :
بازگشت