مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparing keyword extraction techniques for WEBSOM text archives

DocumentCode :

2070839

Title :

Comparing keyword extraction techniques for WEBSOM text archives

Author :

Azcarraga, Arnulfo P. ; Yap, Teddy N., Jr.

Author_Institution :

Sch. of Comput., Nat. Univ. of Singapore, Singapore

fYear :

2001

fDate :

7-9 Nov 2001

Firstpage :

187

Lastpage :

194

Abstract :

The WEBSOM methodology for building very large text archives has a very slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain up to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method

Keywords :

data mining; information retrieval; self-organising feature maps; Reuters text collection; WEBSOM text archives; input data compression; keyword extraction techniques; random projection matrix; self-organizing maps; unit labels; very large text archives; Classification algorithms; Clustering algorithms; Content management; Data compression; Data mining; Educational institutions; Frequency; Labeling; Management training; Organizing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Tools with Artificial Intelligence, Proceedings of the 13th International Conference on

Conference_Location :

Dallas, TX

Print_ISBN :

0-7695-1417-0

Type :

conf

DOI :

10.1109/ICTAI.2001.974464

Filename :

974464

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2070839