DocumentCode
2221144
Title
The Improved Pagerank in Web Crawler
Author
Ling Zhang ; Zheng Qin
Author_Institution
Dept. of Inf. Sci. & Eng., Normal Univ., Changsha, China
fYear
2009
fDate
26-28 Dec. 2009
Firstpage
1889
Lastpage
1892
Abstract
Pagerank is an algorithm for rating web pages. It introduces the relationship of citation in academic papers to evaluate the web page´s authority. It gives the same weight to all edges and ignores the relevancy of web pages to the topic, resulting in a problem of topic-drift. On the analysis of several pagerank algorithms, an improved pagerank based upon thematic segments is proposed. In this algorithm, a web page is divided into several blocks by Html document´s structure and the most weight is given to linkages in the block that is most relevant to given topic. Moreover, the visited outlinks are regarded as feedback to modify blocks´ relevancy. The experiment on Web crawler shows that the new algorithm has some effect on resolving the problem of topic-drift.
Keywords
Web sites; citation analysis; hypermedia markup languages; search engines; HTML document; Web crawler; Web page rating; academic paper; block relevancy; citation; pagerank; thematic segment; topic-drift; Algorithm design and analysis; Couplings; Crawlers; Feedback; HTML; Information science; Internet; Search engines; Uniform resource locators; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location
Nanjing
Print_ISBN
978-1-4244-4909-5
Type
conf
DOI
10.1109/ICISE.2009.1220
Filename
5455065
Link To Document