• DocumentCode
    869870
  • Title

    Evaluating keyword selection methods for WEBSOM text archives

  • Author

    Azcarraga, Arnulfo P. ; Yap, Teddy N., Jr. ; Tan, Jonathan ; Chua, Tat Seng

  • Author_Institution
    Sch. of Inf. Technol. & Comput., De La Salle Univ.-Canlubang, Laguna, Philippines
  • Volume
    16
  • Issue
    3
  • fYear
    2004
  • fDate
    3/1/2004 12:00:00 AM
  • Firstpage
    380
  • Lastpage
    383
  • Abstract
    The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is able to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections.
  • Keywords
    information retrieval systems; self-organising feature maps; text analysis; word processing; SOM weight vectors; WEBSOM text archive; document cluster; keyword selection methods; news document collections; random projection method characteristics; relative weight distribution; Classification algorithms; Clustering algorithms; Computer Society; Frequency; Navigation; Text categorization;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2003.1262193
  • Filename
    1262193