• DocumentCode
    1935499
  • Title

    Semantic focused crawler based on Q-learning and Bayes classifier

  • Author

    Chen, Dong ; Liying, Fang ; Yan Jianzhuo ; Bin Shi

  • Author_Institution
    Coll. of Electron. Inf. & Control Eng., Beijing Univ. of Technol., Beijing, China
  • Volume
    8
  • fYear
    2010
  • fDate
    9-11 July 2010
  • Firstpage
    420
  • Lastpage
    423
  • Abstract
    Semantic focused crawler is an important part of semantic vertical search engine. It is receiving increasing attention as a well founded alternative to search web with the problem of locating topical resource on entire web. In order to retrieval documents related to a given topic, in this paper, we propose QBLP Algorithm which enable crawler adaptive with the changing environment. This feature makes it possible to change behavior of focused crawler according to the particular environment and its relationships with the given input parameters during the search. QBLP Exploited Q learning which features whole-life learning and repayment delay accompany with Bayes classifier. It enables crawler to accumulate experience during the crawling and adjust strategy to achieve goal of making best decision under any circumstance. We make a comparison among QBLP, Best First and Breath First. According to statistics from experiments, We find that QBLP is superior on precision than others in long time crawling.
  • Keywords
    Bayes methods; document handling; learning (artificial intelligence); pattern classification; search engines; semantic Web; Bayes classifier; Q-learning; QBLP algorithm; best first; breath first; documents; search Web; semantic focused crawler; semantic vertical search engine; Crawlers; HTML; Knowledge engineering; Semantic Web; Semantics; Bayes classifier; Q-Learning; Semantic web; focused crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-5537-9
  • Type

    conf

  • DOI
    10.1109/ICCSIT.2010.5563878
  • Filename
    5563878