• DocumentCode
    2221144
  • Title

    The Improved Pagerank in Web Crawler

  • Author

    Ling Zhang ; Zheng Qin

  • Author_Institution
    Dept. of Inf. Sci. & Eng., Normal Univ., Changsha, China
  • fYear
    2009
  • fDate
    26-28 Dec. 2009
  • Firstpage
    1889
  • Lastpage
    1892
  • Abstract
    Pagerank is an algorithm for rating web pages. It introduces the relationship of citation in academic papers to evaluate the web page´s authority. It gives the same weight to all edges and ignores the relevancy of web pages to the topic, resulting in a problem of topic-drift. On the analysis of several pagerank algorithms, an improved pagerank based upon thematic segments is proposed. In this algorithm, a web page is divided into several blocks by Html document´s structure and the most weight is given to linkages in the block that is most relevant to given topic. Moreover, the visited outlinks are regarded as feedback to modify blocks´ relevancy. The experiment on Web crawler shows that the new algorithm has some effect on resolving the problem of topic-drift.
  • Keywords
    Web sites; citation analysis; hypermedia markup languages; search engines; HTML document; Web crawler; Web page rating; academic paper; block relevancy; citation; pagerank; thematic segment; topic-drift; Algorithm design and analysis; Couplings; Crawlers; Feedback; HTML; Information science; Internet; Search engines; Uniform resource locators; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Engineering (ICISE), 2009 1st International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4244-4909-5
  • Type

    conf

  • DOI
    10.1109/ICISE.2009.1220
  • Filename
    5455065