• DocumentCode
    869827
  • Title

    Search engine coverage of the OAI-PMH corpus

  • Author

    McCown, Frank ; Liu, Xindong ; Nelson, Michael L. ; Zubair, Mohammad ; Liu, Xiaoming

  • Author_Institution
    Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA
  • Volume
    10
  • Issue
    2
  • fYear
    2006
  • Firstpage
    66
  • Lastpage
    73
  • Abstract
    Having indexed much of the "surface" Web, search engines are now using various approaches to index the "deep" Web. At the same time, institutional repositories and digital libraries are adopting the open archives initiative protocol for metadata harvesting (OAI-PMH) to expose their holdings. The authors harvested nearly 10 million records from OAI-PMH repositories. From these records, they extracted 3.3 million unique resource URLs and then conducted searches on samples from this collection to determine how much of the OAI-PMH corpus the three major search engines have indexed.
  • Keywords
    digital libraries; meta data; search engines; OAI-PMH corpus; digital library; institutional repository; open archives initiative protocol for metadata harvesting; search engine; Crawlers; Data models; Investments; Protection; Protocols; Robots; Search engines; Software libraries; Uniform resource locators; Writing; OAI PMH; deep web; indexing; search engines;
  • fLanguage
    English
  • Journal_Title
    Internet Computing, IEEE
  • Publisher
    ieee
  • ISSN
    1089-7801
  • Type

    jour

  • DOI
    10.1109/MIC.2006.41
  • Filename
    1607990