A Web document clustering algorithm based on concept of neighbor

Author

Song, Jiang-Chun ; Shen, Jun-Yi

Author_Institution

Dept. of Comput. Sci. & Technol., Xi´´an Jiaotong Univ., China

Volume

1

fYear

2003

fDate

2-5 Nov. 2003

Firstpage

46

Abstract

As the WWW developed rapidly, it becomes the most important resource gradually that transfers and shares the global information as well as being full of the latent capacity. Recent years, the researches of the Web mining have been concerned broadly and gotten a great deal of achievements simultaneously. The nearest neighbor technique, which is a hierarchical clustering method based on distance has been applied to many cases widely for the efficiency and validity. In this paper, based on the vector space model (VSM) of the Web documents, we improved the nearest neighbor method, put forward a new Web document clustering algorithm, and researched the validity and scalability of the algorithm, the time and space complexity of the algorithm.

Keywords

Web sites; computational complexity; data mining; information retrieval systems; unsupervised learning; Web document clustering algorithm; Web mining; World Wide Web; global information; nearest neighbor method; space complexity; time complexity; unsupervised learning; vector space model; Clustering algorithms; Clustering methods; Computer science; Data mining; Nearest neighbor searches; Pattern analysis; Scalability; Unsupervised learning; Web mining; World Wide Web;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2003 International Conference on

Print_ISBN

0-7803-8131-9

Type

conf

DOI

10.1109/ICMLC.2003.1264440

Filename

1264440