• Title of article

    IECA: Intelligent Effective Crawling Algorithm for Web Pages

  • Author/Authors

    Golshani، Mohammad Amin نويسنده Department of Electrical and Computer Engineering, , , ZarehBidoki، AliMohammad نويسنده Department of Electrical and Computer Engineering, ,

  • Issue Information
    فصلنامه با شماره پیاپی 16 سال 2012
  • Pages
    10
  • From page
    33
  • To page
    42
  • Abstract
    Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. Several Crawling algorithms such as Partial PageRank, Batch PageRank, OPIC, and FICA have been proposed, but they have high time complexity or low throughput. To overcome these problems, we propose a new crawling algorithm called IECA which is easy to implement with low time O(E*logV) and memory complexity O(V) -V and E are the number of nodes and edges in the Web graph, respectively. Unlike the mentioned algorithms, IECA traverses the Web graph only once and the importance of the Web pages is determined based on the logarithmic distance and weight of the incoming links. To evaluate IECA, we use three different Web graphs such as the UK-2005, Web graph of university of California, Berkeley-2008, and Iran-2010. Experimental results show that our algorithm outperforms other crawling algorithms in discovering highly important pages.
  • Journal title
    International Journal of Information and Communication Technology Research
  • Serial Year
    2012
  • Journal title
    International Journal of Information and Communication Technology Research
  • Record number

    720262