• DocumentCode
    1827965
  • Title

    Modelling on web dynamic incremental crawling and information processing

  • Author

    Kai Gao ; Wei Wang ; Shen Gao

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Hebei Univ. of Sci. & Technol., Shijiazhuang, China
  • fYear
    2013
  • fDate
    Aug. 31 2013-Sept. 2 2013
  • Firstpage
    293
  • Lastpage
    298
  • Abstract
    The amount of web information is increasing rapidly, and it is continuously being produced and updated in anywhere and anytime by means of Internet and social networks. As for a search engine, keeping up with the evolving web is necessary. How to model the change and which part should be updated more often? Towards this goal, this paper presents the modeling on dynamic web evolution and incremental crawling strategy, and concerns about the refresh interval with minimum waiting time. As a result, the crawling probability on some sites is higher than others so these sites will be given more opportunities to be updated. Based on the web site priority level adjusted algorithm, the dynamic web information gathering strategy is proposed. Through monitoring the proposed metrics, the web site priority level can be dynamically adjusted. It is essential when the bandwidth is not wide enough or the resource is limited. Further, some strategies on web information extraction and processing are also present. The experimental results validate the feasibility of the approach.
  • Keywords
    Internet; information retrieval; search engines; social networking (online); Internet; Web dynamic incremental crawling; Web site priority level adjusted algorithm; crawling probability; dynamic Web evolution; dynamic Web information gathering strategy; incremental crawling strategy; information processing; search engine; social networks; Ink; Monitoring; Yttrium; Search engine; crawler; information extraction; refresh;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modelling, Identification & Control (ICMIC), 2013 Proceedings of International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-0-9567157-3-9
  • Type

    conf

  • Filename
    6642201