• DocumentCode
    2344494
  • Title

    Using high performance systems to build collections for a digital library

  • Author

    Bergmark, Donna

  • Author_Institution
    Comell Digital Libr. Res. Group, Ithaca, NY, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    431
  • Lastpage
    438
  • Abstract
    Nothing is more distributed than the Web, with its content spread across thousands of servers. High performance hardware and software is essential for an effective download, analysis, and organization of this content. We describe our experience with a highly parallel Web crawling system (Mercator) to construct - automatically - collections of scientific resources for the National Science Digital Library.
  • Keywords
    digital libraries; information resources; online front-ends; Web crawler; Web crawling system; automatic collection generation; digital library; massively parallel Web crawling; online resources; scientific resources; topic-related Web documents; Crawlers; Fingerprint recognition; Hardware; Knowledge based systems; Parallel processing; Performance analysis; Software libraries; Software performance; Uniform resource locators; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops, 2002. Proceedings. International Conference on
  • ISSN
    1530-2016
  • Print_ISBN
    0-7695-1680-7
  • Type

    conf

  • DOI
    10.1109/ICPPW.2002.1039762
  • Filename
    1039762