• DocumentCode
    2427815
  • Title

    An Improved Shark-Search Algorithm Based on Multi-information

  • Author

    Chen, Zhumin ; Ma, Jun ; Lei, JingSheng ; Yuan, Bo ; Lian, Li

  • Author_Institution
    Shandong Univ., Jinan
  • Volume
    4
  • fYear
    2007
  • fDate
    24-27 Aug. 2007
  • Firstpage
    659
  • Lastpage
    658
  • Abstract
    With the enormous growth of world wide web, existing general-purpose search engines have presented much more limitations. Focused crawling is increasingly seen as a potential solution. The key of focused crawling is how to accurately predict the relevance of the unvisited web pages pointed to by known URLs to a given topic. A formalized description of the predicting process is introduced. Then, four policies are proposed to predict the relevance of unvisited pages to a topic. Further the combinations of these policies are used to improve the Shark-Search, which is a classic focused crawling algorithm mainly based on the textual information of Web pages. A large number of experiments were carried out to identify the optimized combination and verify that the improved Shark-Search is more effective than the original one.
  • Keywords
    Internet; search engines; Web pages; World Wide Web; focused crawling; general-purpose search engines; improved shark-search algorithm; multiinformation; textual information; Computer science; Crawlers; Educational institutions; Heuristic algorithms; Information science; Marine animals; Search engines; Uniform resource locators; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
  • Conference_Location
    Haikou
  • Print_ISBN
    978-0-7695-2874-8
  • Type

    conf

  • DOI
    10.1109/FSKD.2007.166
  • Filename
    4406469