• DocumentCode
    2850357
  • Title

    IRC: an iterative reinforcement categorization algorithm for interrelated Web objects

  • Author

    Xue, Gui-Rong ; Shen, Dou ; Yang, Qiang ; Zeng, Hua-Jun ; Chen, Zheng ; Yu, Yong ; Xi, Wensi ; Ma, Wei-Ying

  • Author_Institution
    Comput. Sci. & Eng., Shanghai Jiao-Tong Univ., China
  • fYear
    2004
  • fDate
    1-4 Nov. 2004
  • Firstpage
    273
  • Lastpage
    280
  • Abstract
    Most existing categorization algorithms deal with homogeneous Web data objects, and consider interrelated objects as additional features when taking the interrelationships with other types of objects into account. However, focusing on any single aspects of these interrelationships and objects does not fully reveal their true categories. In this paper, we propose a categorization algorithm, the iterative reinforcement categorization algorithm (IRC), to exploit the full interrelationships between the heterogeneous objects on the Web. IRC attempts to classify the interrelated Web objects by iterative reinforcement between individual classification results of different types via the interrelationships. Experiments on a clickthrough log dataset from MSN search engine show that, with the Fl measures, IRC achieves a 26.4% improvement over a pure content-based classification method, a 21% improvement over a query metadata-based method, and a 16.4% improvement over a virtual document-based method. Furthermore, our experiments show that IRC converges rapidly.
  • Keywords
    Web sites; classification; classification; interrelated Web objects; iterative reinforcement categorization; Asia; Classification algorithms; Data engineering; Data mining; Head; Iterative algorithms; Search engines; Statistics; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
  • Print_ISBN
    0-7695-2142-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2004.10079
  • Filename
    1410294