• DocumentCode
    2183605
  • Title

    HDGSOMr: a high dimensional growing self-organizing map using randomness for efficient Web and text mining

  • Author

    Amarasiri, Rasika ; Alahakoon, Damminda ; Smith, Kate ; Premaratne, Malin

  • Author_Institution
    Sch. of Bus. Syst., Monash Univ., Australia
  • fYear
    2005
  • fDate
    19-22 Sept. 2005
  • Firstpage
    215
  • Lastpage
    221
  • Abstract
    Mining of text data from the Web has become a necessity in modern days due to the volumes of data available on the Web. While searching for information on the Web using search engines is popular, to analyze the content on large collections of Web pages, feature map techniques are still popular. One of the problems associated with processing large collections of text data from the Web using feature map techniques is the time taken to cluster them. This paper presents an algorithm based on a growing variant of the self organizing map called the HDGSOMr. This novel algorithm incorporates randomness into the self-organizing process to produce higher quality clusters within few epochs and utilizing smaller neighborhood sizes resulting in a significant reduction in overall processing time. Details of the HDGSOMr algorithm and results of processing large collections of text data proving the efficiency of the algorithm are also presented.
  • Keywords
    Internet; data mining; search engines; self-organising feature maps; text analysis; HDGSOMr; Web mining; Web page; information search; search engine; self-organizing map; text mining; Clustering algorithms; Computer science; Data mining; Information analysis; Organizing; Search engines; Text mining; Web pages; Web sites; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2415-X
  • Type

    conf

  • DOI
    10.1109/WI.2005.70
  • Filename
    1517845