DocumentCode :
1346646
Title :
Self organization of a massive document collection
Author :
Kohonen, Teuvo ; Kaski, Samuel ; Lagus, Krista ; Salojärvi, Jarkko ; Honkela, Jukka ; Paatero, Vesa ; Saarela, Antti
Author_Institution :
Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Volume :
11
Issue :
3
fYear :
2000
fDate :
5/1/2000 12:00:00 AM
Firstpage :
574
Lastpage :
585
Abstract :
Describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6840568 patent abstracts onto a 1002240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms
Keywords :
classification; data mining; document image processing; self-organising feature maps; massive document collection; patent abstracts; random projections; self-organizing map algorithm; statistical representations; stochastic figures; textual similarities; weighted word histograms; Abstracts; Data analysis; Data mining; Histograms; Humans; Joining processes; Spatial databases; Stochastic processes; Vocabulary; Web sites;
fLanguage :
English
Journal_Title :
Neural Networks, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9227
Type :
jour
DOI :
10.1109/72.846729
Filename :
846729
Link To Document :
بازگشت