• DocumentCode
    2226293
  • Title

    Topic continuity for Web document categorization and ranking

  • Author

    Narayan, B.L. ; Murthy, C.A. ; Pal, Sankar K.

  • Author_Institution
    Machine Intelligence Unit, Indian Stat. Inst., Kolkata, India
  • fYear
    2003
  • fDate
    13-17 Oct. 2003
  • Firstpage
    310
  • Lastpage
    315
  • Abstract
    PageRank is primarily based on link structure analysis. Recently, it has been shown that content information can be utilized to improve link analysis. We propose a novel algorithm that harnesses the information contained in the history of a surfer to determine the topic of interest on a given page. As the history is unavailable until query time, we guess it probabilistically so that the operations can be performed offline. This leads to a better Web page categorization and, thereby, to a better ranking of Web pages.
  • Keywords
    Web sites; citation analysis; search engines; PageRank; Web document categorization; Web page ranking; Web sites; citation analysis; content information; link structure analysis; search engines; Citation analysis; Content based retrieval; Frequency; History; Information analysis; Information retrieval; Machine intelligence; Search engines; Text analysis; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
  • Print_ISBN
    0-7695-1932-6
  • Type

    conf

  • DOI
    10.1109/WI.2003.1241209
  • Filename
    1241209