Title :
A Method to Automatically Discover and Classify Deep Web Data Source Using Multi-Classifier
Author :
Zhi-tao, Li ; Quan, Liu ; Zhi-ming, Cui ; Yu-chen, Fu
Author_Institution :
Inst. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
fDate :
March 31 2009-April 2 2009
Abstract :
Recently, the discovery of deep Web data source and domain-relevant issue attract more and more attentions. This paper proposed a method using multi-classifier to discover and classify the data source of deep Web. Firstly, It used naive Bayes classifier to class the page into domain relevance or not. Secondly, improved C4.5 decision tree algorithm was used to identify the query interface. The result of the experiment competed with single decision tree classifier proved this method is effective.
Keywords :
Bayes methods; Internet; decision trees; pattern classification; query processing; data classification; data discovery; data multiclassifier; data source; decision tree; deep Web; naive Bayes classifier; query interface; Classification tree analysis; Computer science; Crawlers; Data engineering; Data mining; Databases; Decision trees; Frequency; HTML; Testing;
Conference_Titel :
Computer Science and Information Engineering, 2009 WRI World Congress on
Conference_Location :
Los Angeles, CA
Print_ISBN :
978-0-7695-3507-4
DOI :
10.1109/CSIE.2009.435