Title :
Comparing classification methods for link context based focused crawlers
Author :
Caliskan, Kamil ; Ozcan, Rifat
Author_Institution :
Dept. of Comput. Eng., Turgut Ozal Univ., Ankara, Turkey
Abstract :
Focused crawlers aim to fetch pages only related to a specific subject area from millions of web pages on the Internet. The essential task in a focused crawler is to predict whether a page is related to the target subject area or not without actually fetching the page content itself. Link context based focused crawlers focus on the surrounding text around each link to classify the page pointed by the URL. In this paper, we aim to compare three different classification methods (naïve bayes, decision tree, and support vector machines) for the task of link context based focused crawling.
Keywords :
Bayes methods; Internet; data mining; decision trees; pattern classification; search engines; support vector machines; Internet; URL; Web pages; classification method; decision tree; focused crawler; link context; naïve Bayes method; support vector machines; Accuracy; Context; Crawlers; Decision trees; Search engines; Support vector machines; Web pages; classification; focused crawling; link context;
Conference_Titel :
Electronics, Computer and Computation (ICECCO), 2013 International Conference on
Conference_Location :
Ankara
DOI :
10.1109/ICECCO.2013.6718249