DocumentCode
3324962
Title
PageChaser: A Tool for the Automatic Correction of Broken Web Links
Author
Morishima, Atsuyuki ; Nakamizo, Akiyoshi ; Iida, Tomoharu ; Sugimoto, Shigeo ; Kitagawa, Hiroyuki
Author_Institution
Univ. of Tsukuba, Tsukuba
fYear
2008
fDate
7-12 April 2008
Firstpage
1486
Lastpage
1488
Abstract
PageChaser is a system that monitors links between Web pages and searches for the new locations of moved Web pages when it finds broken links. The problem of searching for moved pages is different from typical information retrieval problems. First, it is impossible to identify the final destination until the page is actually moved, so the index-server approach is not necessarily effective. Secondly, there is a large bias about where the new address is likely to be and crawler-based solutions can be effectively implemented, avoiding the need to search the entire Web. PageChaser incorporates a comprehensive set of heuristics, some of which are novel, in a single unified framework. This paper explains the underlying ideas behind the design and development of PageChaser.
Keywords
Internet; information retrieval; search engines; PageChaser; Web pages; crawler-based solution; index-server approach; information retrieval; search engine; Content management; Databases; Indexes; Information retrieval; Project management; Software development management; Software tools; Uniform resource locators; Web pages; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4244-1836-7
Electronic_ISBN
978-1-4244-1837-4
Type
conf
DOI
10.1109/ICDE.2008.4497598
Filename
4497598
Link To Document