DocumentCode
1992486
Title
Research on discovering Deep Web entries based ontopic crawling and ontology
Author
Liu, Gang ; Liu, Kai ; Dang, Yuan-yuan
Author_Institution
Sch. of Comput. Sci. & Eng., Changchun Univ. of Technol., Changchun, China
fYear
2011
fDate
16-18 Sept. 2011
Firstpage
2488
Lastpage
2490
Abstract
Abstract-With the wide application of web database, Web pages are continuously deepened. In order to use Deep Web resources effectively, the Deep Web data need to be integrated on a large scale, and data sources discover is the primary work of Deep Web resources integration. The Deep Web site found efficiently is the key of the Deep Web data Integration. This paper puts forward a kind of Deep Web entry automatic discovery method. In this paper, firstly using the information of specific field Deep Web entry form to establish domain ontology, then web forms can be judged by the process of the topic crawler crawling in the web. If there are forms which are extracted its attributes and calculate the weights from form´s attributes and ontology. Download this page when the weights greater than the fixed value. Finally we use test words to examine the already download pages to find out high quality Deep Web entry pages.
Keywords
Internet; Web sites; data mining; ontologies (artificial intelligence); Deep Web data Integration; Deep Web entry automatic discovery method; Deep Web resource; Deep Web site; Web database; Web page; attribute extraction; data sources discover; domain ontology; ontopic crawling; Accuracy; Bayesian methods; Correlation; Crawlers; Databases; HTML; Ontologies; Bayes Classifier; Data Source Diseovery; Deep Web; Ontology;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Control Engineering (ICECE), 2011 International Conference on
Conference_Location
Yichang
Print_ISBN
978-1-4244-8162-0
Type
conf
DOI
10.1109/ICECENG.2011.6057954
Filename
6057954
Link To Document