DocumentCode
3165972
Title
Clustering of web search results based on an Iterative Fuzzy C-means Algorithm and Bayesian Information Criterion
Author
Cobos, Carlos ; Mendoza, M. ; Leon, Errol ; Manic, Milos ; Herrera-Viedma, Enrique
Author_Institution
Comput. Sci. Dept., Univ. del Cauca, Popayan, Colombia
fYear
2013
fDate
24-28 June 2013
Firstpage
507
Lastpage
512
Abstract
The clustering of web search has become a very interesting research area among academic and scientific communities involved in information retrieval. Clustering of web search result systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for web document clustering already exist, but results show there is room for more to be done. This paper introduces a new description-centric algorithm for clustering of web results called IFCWR. IFCWR initially selects a maximum estimated number of clusters using Forgy´s strategy, then it iteratively merges clusters until results cannot be improved. Every merge operation implies the execution of Fuzzy C-Means for clustering results of web search and the calculus of Bayesian Information Criterion for automatically evaluating the best solution and number of clusters. IFCWR was compared against other established web document clustering algorithms, among them: Suffix Tree Clustering and Lingo. Comparison was executed on AMBIENT and MORESQUE datasets, using precision, recall, f-measure, SSLk and other metrics. Results show a considerable improvement in clustering quality and performance.
Keywords
Internet; belief networks; document handling; iterative methods; pattern clustering; AMBIENT dataset; Bayesian information criterion; IFCWR; Lingo; MORESQUE dataset; SSLk metric; Web clustering engines; Web document clustering; Web search results clustering; clustering performance; clustering quality; description-centric algorithm; f-measure metric; iterative fuzzy C-means algorithm; merge operation; precision metric; recall metric; suffix tree clustering; Accuracy; Algorithm design and analysis; Bayes methods; Clustering algorithms; Educational institutions; Partitioning algorithms; Web search; bayesian information criterion; fuzzy c-means; web document clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint
Conference_Location
Edmonton, AB
Type
conf
DOI
10.1109/IFSA-NAFIPS.2013.6608452
Filename
6608452
Link To Document