DocumentCode :
570200
Title :
Discovery and cataloging of deep Web sources
Author :
Hicks, C. ; Scheffer, Markus ; Ngu, Anne H. H. ; Sheng, Quan Z.
Author_Institution :
Dept. of Comput. Sci., Texas State Univ., San Marcos, TX, USA
fYear :
2012
fDate :
8-10 Aug. 2012
Firstpage :
224
Lastpage :
230
Abstract :
With more and more information goes online, extracting and managing the information from the Internet is becoming increasingly important. While the surface Web´s information is relatively easy to obtain thanks to search engines such as Google and Bing, collecting the information from the deep Web is still a challenging task and these search engines do not index information located inside the deep Web. Compared to the surface Web, the deep Web contains vast more information. In particular, building a generalized search engine that can index deep Web across all domains remains a difficult research problem. In this paper, we highlight these challenges and demonstrate via prototype implementation of a generalized deep Web discovery framework that can achieve high precision.
Keywords :
Internet; indexing; information retrieval; search engines; Bing; Google; Internet; deep Web source cataloging; deep Web source discovery; information index; search engines; Crawlers; Google; HTML; Indexes; Manuals; Search engines; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-2282-9
Electronic_ISBN :
978-1-4673-2283-6
Type :
conf
DOI :
10.1109/IRI.2012.6303014
Filename :
6303014
Link To Document :
بازگشت