DocumentCode :
3772299
Title :
Agglomerative Hierarchical Clustering for Information Retrieval Using Latent Semantic Index
Author :
Hansaem Park;Kyunglag Kwon;Abdel-ilah Zakaria Khiati;Jeungmin Lee;In-Jeong Chung
Author_Institution :
Dept. of Comput. Sci., Korea Univ., Sejong, South Korea
fYear :
2015
Firstpage :
426
Lastpage :
431
Abstract :
Web clustering has been a highly interesting research field in Information Retrieval (IR) for many years. Considering the amount of web sites listed with an ambiguous query on major search engines, many researchers opted for Search Results Clustering (SRC) aiming on grouping vast lists of results into topically comprehensible clusters. Although some well-known algorithms exist already, results show there is still more work to be done in many aspects. This paper proposes method integrating Latent Semantic Indexing (LSI) with Agglomerative Hierarchical Clustering (AHC). The approach behind combining these two methods is to counter the synonymy and polysemy that occurs when previous SRC methods use bag-of-words model. Moreover, we observe that clusters by previous SRC methods are not satisfied and can be further clustered. Thus, we give room for other hidden topics to be shown. For the verification of proposed method, we use two common datasets AMBIguous ENTries (AMBIENT) and MORE Sense-tagged QUEries (MORESQUE), showing significant improvement in terms of clustering quality.
Keywords :
"Clustering algorithms","Large scale integration","Semantics","Search engines","Indexing"
Publisher :
ieee
Conference_Titel :
Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/SmartCity.2015.108
Filename :
7463762
Link To Document :
بازگشت