مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic text classification and focused crawling

DocumentCode :

2528345

Title :

Automatic text classification and focused crawling

Author :

Samarawickrama, Sameendra ; Jayaratne, Lakshman

Author_Institution :

Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka

fYear :

2011

fDate :

26-28 Sept. 2011

Firstpage :

143

Lastpage :

148

Abstract :

A focused crawler is a web crawler that traverse the web to explore information that is related to a particular topic of interest only. On the other hand, generic web crawlers try to search the entire web, which is impossible due to the size and the complexity of WWW. In this paper we make a survey of some of the latest focused web crawling approaches discussing each with their experimental results. We categorize them as focused crawling based on content analysis, focused crawling based on link analysis and focused crawling based on both the content and link analysis. We also give an insight to the future research and draw the overall conclusions.

Keywords :

Web sites; information retrieval; pattern classification; search engines; text analysis; WWW; Web crawler; automatic text classification; content analysis; focused crawler; link analysis; Crawlers; Search engines; Support vector machines; Training; Training data; Vectors; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Digital Information Management (ICDIM), 2011 Sixth International Conference on

Conference_Location :

Melbourn, QLD

ISSN :

Pending

Print_ISBN :

978-1-4577-1538-9

Type :

conf

DOI :

10.1109/ICDIM.2011.6093329

Filename :

6093329

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2528345