DocumentCode
2011078
Title
Estimation of Optimal Topic Spider Strategy by Use of Decision Trees
Author
Lin, Kunhui
Author_Institution
Xiamen Univ., Xiamen
fYear
2007
fDate
May 30 2007-June 1 2007
Firstpage
2806
Lastpage
2809
Abstract
The design of a good topic spider entails an optimal strategy for prioritizing the unvisited URLs. This paper uses a decision tree on anchor texts of hyperlinks to determine the prioritization. A novel taxonomy based topic relevance computation function, which embeds machine learning, classifies pages. Evaluation on different data sets shows that the proposed approach leads to promising results.
Keywords
classification; decision trees; learning (artificial intelligence); relevance feedback; search engines; vocabulary; Web crawling; Web page classification; decision tree; machine learning; optimal topic spider strategy estimation; search engine; taxonomy based topic relevance computation function; Automatic control; Crawlers; Decision trees; Design automation; Machine learning; Optimal control; Taxonomy; Uniform resource locators; Vocabulary; Web pages; decision tree; machine learning; optimal estimation; topic spider;
fLanguage
English
Publisher
ieee
Conference_Titel
Control and Automation, 2007. ICCA 2007. IEEE International Conference on
Conference_Location
Guangzhou
Print_ISBN
978-1-4244-0818-4
Electronic_ISBN
978-1-4244-0818-4
Type
conf
DOI
10.1109/ICCA.2007.4376873
Filename
4376873
Link To Document