• DocumentCode
    570200
  • Title

    Discovery and cataloging of deep Web sources

  • Author

    Hicks, C. ; Scheffer, Markus ; Ngu, Anne H. H. ; Sheng, Quan Z.

  • Author_Institution
    Dept. of Comput. Sci., Texas State Univ., San Marcos, TX, USA
  • fYear
    2012
  • fDate
    8-10 Aug. 2012
  • Firstpage
    224
  • Lastpage
    230
  • Abstract
    With more and more information goes online, extracting and managing the information from the Internet is becoming increasingly important. While the surface Web´s information is relatively easy to obtain thanks to search engines such as Google and Bing, collecting the information from the deep Web is still a challenging task and these search engines do not index information located inside the deep Web. Compared to the surface Web, the deep Web contains vast more information. In particular, building a generalized search engine that can index deep Web across all domains remains a difficult research problem. In this paper, we highlight these challenges and demonstrate via prototype implementation of a generalized deep Web discovery framework that can achieve high precision.
  • Keywords
    Internet; indexing; information retrieval; search engines; Bing; Google; Internet; deep Web source cataloging; deep Web source discovery; information index; search engines; Crawlers; Google; HTML; Indexes; Manuals; Search engines; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4673-2282-9
  • Electronic_ISBN
    978-1-4673-2283-6
  • Type

    conf

  • DOI
    10.1109/IRI.2012.6303014
  • Filename
    6303014