DocumentCode
570200
Title
Discovery and cataloging of deep Web sources
Author
Hicks, C. ; Scheffer, Markus ; Ngu, Anne H. H. ; Sheng, Quan Z.
Author_Institution
Dept. of Comput. Sci., Texas State Univ., San Marcos, TX, USA
fYear
2012
fDate
8-10 Aug. 2012
Firstpage
224
Lastpage
230
Abstract
With more and more information goes online, extracting and managing the information from the Internet is becoming increasingly important. While the surface Web´s information is relatively easy to obtain thanks to search engines such as Google and Bing, collecting the information from the deep Web is still a challenging task and these search engines do not index information located inside the deep Web. Compared to the surface Web, the deep Web contains vast more information. In particular, building a generalized search engine that can index deep Web across all domains remains a difficult research problem. In this paper, we highlight these challenges and demonstrate via prototype implementation of a generalized deep Web discovery framework that can achieve high precision.
Keywords
Internet; indexing; information retrieval; search engines; Bing; Google; Internet; deep Web source cataloging; deep Web source discovery; information index; search engines; Crawlers; Google; HTML; Indexes; Manuals; Search engines; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4673-2282-9
Electronic_ISBN
978-1-4673-2283-6
Type
conf
DOI
10.1109/IRI.2012.6303014
Filename
6303014
Link To Document