DocumentCode
3081319
Title
HiCrawl: A Hidden Web Crawler for Medical Domain
Author
Gupta, Swastik ; Bhatia, Komal Kumar
Author_Institution
Dept. of Comput. Eng., YMCA Univ. of Sci. & Technol., Faridabad, India
fYear
2013
fDate
24-26 Aug. 2013
Firstpage
152
Lastpage
157
Abstract
The Hidden Web refers to a huge portion of the WWW that holds numerous freely accessible Web databases, hidden behind search form interfaces which can only be accessed through dynamic web pages that are generated in response to the user queries issued at the search form interface. Thus, the core challenge to implement any crawler for the Hidden Web is to routinely surpass these search form interfaces by automatically generating & issuing queries that help discover these dynamic Web pages. The paper provides a novel approach to guide the crawler in choosing the right query term to be submitted to any search form interface that has been designed to accept keywords or terms as input to it. The system is based on the use of classification hierarchies that might have either been manually or automatically constructed. And for the purposes of illustration, we have considered the search form interfaces in the ´Medical´ domain, it being one of the most popular domains used by the researchers and the use of a manually generated top-down classification hierarchy in the same domain.
Keywords
Internet; information retrieval; medical information systems; search engines; HiCrawl; Web database; World Wide Web; classification hierarchy; dynamic Web pages; hidden Web crawler; medical domain; Crawlers; Databases; Larynx; Lungs; Nose; Web pages; Content Retrieval; Hidden Web; Surface Web; WWW; automatic form filling; crawlers; form processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational and Business Intelligence (ISCBI), 2013 International Symposium on
Conference_Location
New Delhi
Print_ISBN
978-0-7695-5066-4
Type
conf
DOI
10.1109/ISCBI.2013.39
Filename
6724343
Link To Document