• DocumentCode
    2011078
  • Title

    Estimation of Optimal Topic Spider Strategy by Use of Decision Trees

  • Author

    Lin, Kunhui

  • Author_Institution
    Xiamen Univ., Xiamen
  • fYear
    2007
  • fDate
    May 30 2007-June 1 2007
  • Firstpage
    2806
  • Lastpage
    2809
  • Abstract
    The design of a good topic spider entails an optimal strategy for prioritizing the unvisited URLs. This paper uses a decision tree on anchor texts of hyperlinks to determine the prioritization. A novel taxonomy based topic relevance computation function, which embeds machine learning, classifies pages. Evaluation on different data sets shows that the proposed approach leads to promising results.
  • Keywords
    classification; decision trees; learning (artificial intelligence); relevance feedback; search engines; vocabulary; Web crawling; Web page classification; decision tree; machine learning; optimal topic spider strategy estimation; search engine; taxonomy based topic relevance computation function; Automatic control; Crawlers; Decision trees; Design automation; Machine learning; Optimal control; Taxonomy; Uniform resource locators; Vocabulary; Web pages; decision tree; machine learning; optimal estimation; topic spider;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control and Automation, 2007. ICCA 2007. IEEE International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    978-1-4244-0818-4
  • Electronic_ISBN
    978-1-4244-0818-4
  • Type

    conf

  • DOI
    10.1109/ICCA.2007.4376873
  • Filename
    4376873