DocumentCode :
1920789
Title :
A maximal frequent itemset approach for Web document clustering
Author :
Zhuang, Ling ; Dai, Honghua
Author_Institution :
Sch. of Inf. Technol., Deakin Univ., Burwood, Vic., Australia
fYear :
2004
fDate :
14-16 Sept. 2004
Firstpage :
970
Lastpage :
977
Abstract :
To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.
Keywords :
Internet; data mining; iterative methods; optimisation; pattern clustering; text analysis; K-means iterative clustering; Web document clustering; Web search engine; expectation-maximization iterative clustering; maximal frequent itemset approach; Australia; Clustering algorithms; Data mining; Information technology; Itemsets; Iterative algorithms; Road transportation; Search engines; Testing; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology, 2004. CIT '04. The Fourth International Conference on
Print_ISBN :
0-7695-2216-5
Type :
conf
DOI :
10.1109/CIT.2004.1357322
Filename :
1357322
Link To Document :
بازگشت