DocumentCode :
1879943
Title :
Application of clickstream analysis as Web page importance metric in parallel crawlers
Author :
Selamat, Ali ; Ahmadi-Abkenari, Fatemeh
Author_Institution :
Intell. Software Eng. Lab., Univ. Teknol. Malaysia, Skudai, Malaysia
Volume :
1
fYear :
2010
fDate :
15-17 June 2010
Firstpage :
1
Lastpage :
6
Abstract :
Employing a parallel crawler as a multi processes crawler causes different issues of concern in comparison to applying a single-process crawler. These issues impact on achieving the results with higher or even the same quality from a parallel crawler in comparison to a centralized one. Existed parallel crawlers´ architectures employ link dependant metrics - such as Backlink count or PageRank - for URL importance determination in order to prioritize the queue of each process. Then the specific number of the most important pages is sent to the index section of the crawler for further processing on their content. Application of metrics with link dependent nature causes considerable overhead on the overall parallel crawler resulted from the link information exchange among different processes. In this paper we propose the application of clickstream analysis as a link independent Web page importance metric in a parallel crawler. Our approach includes proposing an algorithm for a balanced performance of different processes within a parallel crawler which results in the discovery of higher quality pages by the overall parallel crawler with less overhead in comparison to a centralized crawler which employs link dependant metrics of importance.
Keywords :
Web sites; information filters; information retrieval; Backlink count; PageRank; URL importance determination; Web page importance metric; clickstream analysis; link dependant metrics; multi processes crawler; parallel crawlers; Crawlers; Equations; Fires; Mathematical model; Measurement; Web pages; Clickstream analysis; Parallel crawlers; Web data management; Web page Importance metrics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology (ITSim), 2010 International Symposium in
Conference_Location :
Kuala Lumpur
ISSN :
2155-897
Print_ISBN :
978-1-4244-6715-0
Type :
conf
DOI :
10.1109/ITSIM.2010.5561354
Filename :
5561354
Link To Document :
بازگشت