Title :
A Web document clustering algorithm based on concept of neighbor
Author :
Song, Jiang-Chun ; Shen, Jun-Yi
Author_Institution :
Dept. of Comput. Sci. & Technol., Xi´´an Jiaotong Univ., China
Abstract :
As the WWW developed rapidly, it becomes the most important resource gradually that transfers and shares the global information as well as being full of the latent capacity. Recent years, the researches of the Web mining have been concerned broadly and gotten a great deal of achievements simultaneously. The nearest neighbor technique, which is a hierarchical clustering method based on distance has been applied to many cases widely for the efficiency and validity. In this paper, based on the vector space model (VSM) of the Web documents, we improved the nearest neighbor method, put forward a new Web document clustering algorithm, and researched the validity and scalability of the algorithm, the time and space complexity of the algorithm.
Keywords :
Web sites; computational complexity; data mining; information retrieval systems; unsupervised learning; Web document clustering algorithm; Web mining; World Wide Web; global information; nearest neighbor method; space complexity; time complexity; unsupervised learning; vector space model; Clustering algorithms; Clustering methods; Computer science; Data mining; Nearest neighbor searches; Pattern analysis; Scalability; Unsupervised learning; Web mining; World Wide Web;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1264440