Title :
A Regression Model-Based Approach to Accessing the Deep Web
Author_Institution :
Coll. of Comput. Sci., South-Central Univ. for Nat., Wuhan, China
Abstract :
An increasing number of data sources become available on the Web now, but often their contents are only accessible through query interfaces. For a domain of interest, accessing deep Web content has been a long-standing challenge. In this paper, we propose a deep Web crawling approach based on ordinal regression model. We divide page into 3 levels, and take the feedback of page classifier as an ordinal regression problem. We also take into account the interests of link delay; the related links are limited within 3 layers or less. Experiment results demonstrate that the feedback- based crawling strategy could effectively improve the crawling speed and accuracy.
Keywords :
Internet; Web sites; query processing; regression analysis; data sources; deep Web access; deep Web crawling; feedback; page classifier; query interfaces; regression model; Crawlers; Data mining; Databases; Feature extraction; Search engines; Training; Web pages;
Conference_Titel :
Internet Technology and Applications (iTAP), 2011 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-7253-6
DOI :
10.1109/ITAP.2011.6006322