Title :
Comparing keyword extraction techniques for WEBSOM text archives
Author :
Azcarraga, Arnulfo P. ; Yap, Teddy N., Jr.
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
Abstract :
The WEBSOM methodology for building very large text archives has a very slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain up to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method
Keywords :
data mining; information retrieval; self-organising feature maps; Reuters text collection; WEBSOM text archives; input data compression; keyword extraction techniques; random projection matrix; self-organizing maps; unit labels; very large text archives; Classification algorithms; Clustering algorithms; Content management; Data compression; Data mining; Educational institutions; Frequency; Labeling; Management training; Organizing;
Conference_Titel :
Tools with Artificial Intelligence, Proceedings of the 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
0-7695-1417-0
DOI :
10.1109/ICTAI.2001.974464