• DocumentCode
    3324962
  • Title

    PageChaser: A Tool for the Automatic Correction of Broken Web Links

  • Author

    Morishima, Atsuyuki ; Nakamizo, Akiyoshi ; Iida, Tomoharu ; Sugimoto, Shigeo ; Kitagawa, Hiroyuki

  • Author_Institution
    Univ. of Tsukuba, Tsukuba
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1486
  • Lastpage
    1488
  • Abstract
    PageChaser is a system that monitors links between Web pages and searches for the new locations of moved Web pages when it finds broken links. The problem of searching for moved pages is different from typical information retrieval problems. First, it is impossible to identify the final destination until the page is actually moved, so the index-server approach is not necessarily effective. Secondly, there is a large bias about where the new address is likely to be and crawler-based solutions can be effectively implemented, avoiding the need to search the entire Web. PageChaser incorporates a comprehensive set of heuristics, some of which are novel, in a single unified framework. This paper explains the underlying ideas behind the design and development of PageChaser.
  • Keywords
    Internet; information retrieval; search engines; PageChaser; Web pages; crawler-based solution; index-server approach; information retrieval; search engine; Content management; Databases; Indexes; Information retrieval; Project management; Software development management; Software tools; Uniform resource locators; Web pages; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497598
  • Filename
    4497598