DocumentCode
2226293
Title
Topic continuity for Web document categorization and ranking
Author
Narayan, B.L. ; Murthy, C.A. ; Pal, Sankar K.
Author_Institution
Machine Intelligence Unit, Indian Stat. Inst., Kolkata, India
fYear
2003
fDate
13-17 Oct. 2003
Firstpage
310
Lastpage
315
Abstract
PageRank is primarily based on link structure analysis. Recently, it has been shown that content information can be utilized to improve link analysis. We propose a novel algorithm that harnesses the information contained in the history of a surfer to determine the topic of interest on a given page. As the history is unavailable until query time, we guess it probabilistically so that the operations can be performed offline. This leads to a better Web page categorization and, thereby, to a better ranking of Web pages.
Keywords
Web sites; citation analysis; search engines; PageRank; Web document categorization; Web page ranking; Web sites; citation analysis; content information; link structure analysis; search engines; Citation analysis; Content based retrieval; Frequency; History; Information analysis; Information retrieval; Machine intelligence; Search engines; Text analysis; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN
0-7695-1932-6
Type
conf
DOI
10.1109/WI.2003.1241209
Filename
1241209
Link To Document