DocumentCode
1928561
Title
Competitive learning mechanisms for scalable, incremental and balanced clustering of streaming texts
Author
Banerjee, Arindam ; Ghosh, Joydeep
Author_Institution
Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
Volume
4
fYear
2003
fDate
20-24 July 2003
Firstpage
2697
Abstract
Automated clustering of text documents such as Web pages is becoming increasingly important for organizing the vast amounts of information available over the Internet. This problem is also very challenging since typically text is represented by very high dimensional (> 1000), normalized (unit length) vectors. Moreover documents are continually being created and their statistics also change with time because of changing new-stories etc, so one needs incremental learning algorithms that can adapt to non-stationary environments. We model high-dimensional, normalized data using a mixture of von Mises-Fisher distributions, and then modify this generative model in a principled way to yield frequency sensitive competitive learning mechanisms that are applicable to streaming data, and produce balanced clusters. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.
Keywords
Internet; pattern clustering; text analysis; unsupervised learning; Internet; Web pages; automated clustering text documents; balanced clustering; competitive learning mechanisms; documents; frequency sensitive competitive learning mechanisms; high-dimensional normalized data; incremental clustering; nonstationary environments; scalable clustering; statistics; von Mises-Fisher distributions; Clustering algorithms; Frequency; Internet; Learning systems; Navigation; Organizing; Sparse matrices; Vocabulary; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2003. Proceedings of the International Joint Conference on
ISSN
1098-7576
Print_ISBN
0-7803-7898-9
Type
conf
DOI
10.1109/IJCNN.2003.1223993
Filename
1223993
Link To Document