• DocumentCode
    2404444
  • Title

    The BINGO! focused crawler: from bookmarks to archetypes

  • Author

    Sizov, Sergej ; Siersdorfer, Stefan ; Theobald, Martin ; Weikum, Gerhard

  • Author_Institution
    Saarlandes Univ., Saarbrucken, Germany
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    337
  • Lastpage
    338
  • Abstract
    The BINGO! system implements an approach to focused crawling that aims to overcome the limitations of the initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic "archetypes" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far. Two kinds of archetypes are considered: good authorities as determined by employing Kleinberg\´s link analysis algorithm, and documents that have been automatically classified with high confidence using a linear SVM classifier
  • Keywords
    classification; hypermedia markup languages; BINGO! focused crawler; Kleinberg link analysis algorithm; archetypes; best URLs; bookmarks; crawl frontier; linear SVM classifier; positively classified documents; re-training; Costs; Crawlers; Humans; Ontologies; Search engines; Support vector machine classification; Support vector machines; Training data; Uniform resource locators; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2002. Proceedings. 18th International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1531-2
  • Type

    conf

  • DOI
    10.1109/ICDE.2002.994746
  • Filename
    994746