DocumentCode :
2646490
Title :
Applying Semantic Suffix Net to suffix tree clustering
Author :
Janruang, Jongkol ; Guha, Sumanta
Author_Institution :
Comput. Sci. & Inf. Manage. Program, Asian Inst. of Technol., Pathumthani, Thailand
fYear :
2011
fDate :
28-29 June 2011
Firstpage :
146
Lastpage :
152
Abstract :
In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.
Keywords :
data structures; document handling; information retrieval; pattern clustering; search engines; string matching; trees (mathematics); STC algorithm; data structure; search engines; semantic similarity; semantic suffix net; string matching; suffix tree clustering; Algorithm design and analysis; Clustering algorithms; Data structures; Joints; Pediatrics; Semantics; data mining; semantic suffix net; semantic web search results clustering; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining and Optimization (DMO), 2011 3rd Conference on
Conference_Location :
Putrajaya
ISSN :
2155-6938
Print_ISBN :
978-1-61284-211-0
Electronic_ISBN :
2155-6938
Type :
conf
DOI :
10.1109/DMO.2011.5976519
Filename :
5976519
Link To Document :
بازگشت