Title :
Semantic Similarity Based Focused Crawling
Author :
Ravakhah, Mehdi ; Kamyar, Mohsen
Author_Institution :
Islamic Azad Univ. Mashhad Branch, Mashhad, Iran
Abstract :
Finding useful information from the Web which has a large and distributed structure requires efficient search strategies. Distributed and dynamic nature of Web resources is a major problem for search engines maintain up-to-date index of the Web content as they have to crawl the Web periodically. A focused or topic-driven crawler is a specific type of crawler that analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl while avoiding irrelevant regions of the Web. Yet many types of crawlers have been suggested that are usually different in their crawl strategy. The most important difference is in prioritizing links to download. To do this, focused crawler has an algorithm for classifying. In this paper we propose a new algorithm used to classify in focused crawlers. Our algorithm is based on page contents and uses a semantic classification.
Keywords :
pattern classification; search engines; semantic Web; Web content; Web resource; crawl strategy; download link; search engine strategy; semantic classification; semantic similarity based focused crawling; Bandwidth; Competitive intelligence; Computational intelligence; Costs; Crawlers; Deductive databases; Distributed processing; Search engines; Web pages; Web sites;
Conference_Titel :
Computational Intelligence, Communication Systems and Networks, 2009. CICSYN '09. First International Conference on
Conference_Location :
Indore
Print_ISBN :
978-0-7695-3743-6
DOI :
10.1109/CICSYN.2009.92