• DocumentCode
    2728668
  • Title

    FICA: A Fast Intelligent Crawling Algorithm

  • Author

    Bidoki, Ali Mohammad Zareh ; Yazdani, Nasser ; Ghodsnia, Pedram

  • Author_Institution
    Univ. of Tehran, Tehran
  • fYear
    2007
  • fDate
    2-5 Nov. 2007
  • Firstpage
    635
  • Lastpage
    641
  • Abstract
    Due to the proliferation and highly dynamic nature of the Web, an efficient crawling and ranking algorithm for retrieving the most important pages has remained as a challenging issue. Several algorithms like PageRank (Page et al., 1998) and OPIC (Abiteboul et al., 2003) have been proposed. Unfortunately, they have high time complexity. In this paper, an intelligent crawling algorithm based on reinforcement learning, called FICA is proposed that models a real surfing user. The priority for crawling pages is based on a concept which we name as logarithmic distance. FICA is easy to implement and its time complexity is 0(E*logV) where V and E are the number of nodes and edges in the Web graph respectively. Comparison of the FICA with other proposed algorithms shows that FICA outperforms them in discovering highly important pages. Furthermore FICA computes the importance (ranking) of each page during the crawling process. Thus, we can also use FICA as a ranking method for computation of page importance. We have used UK´s Web graph for our experiments.
  • Keywords
    Web sites; computational complexity; graph theory; information retrieval; learning (artificial intelligence); Web graph; World Wde Web; fast intelligent crawling algorithm; logarithmic distance; ranking algorithm; real surfing user; reinforcement learning; time complexity; Algorithm design and analysis; Crawlers; Learning; Search engines; Throughput; Web pages; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, IEEE/WIC/ACM International Conference on
  • Conference_Location
    Fremont, CA
  • Print_ISBN
    978-0-7695-3026-0
  • Type

    conf

  • DOI
    10.1109/WI.2007.91
  • Filename
    4427164