Title :
An Agent-Based Focused Crawling Framework for Topic- and Genre-Related Web Document Discovery
Author :
Pappas, Nikolaos ; Katsimpras, G. ; Stamatatos, E.
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Abstract :
The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic- and genre-related web documents. Starting from a simple topic query, a set of focused crawler agents explore in parallel topic-specific web paths using dynamic seed URLs that belong to certain web genres and are collected from web search engines. The agents make use of an internal mechanism that weighs topic and genre relevance scores of unvisited web pages. They are able to adapt to the properties of a given topic by modifying their internal knowledge during search, handle ambiguous queries, ignore irrelevant pages with respect to the topic and retrieve collaboratively topic-relevant web pages. We performed an experimental study to evaluate the behavior of the agents for a variety of topic queries demonstrating the benefits and the capabilities of our framework.
Keywords :
Internet; Web sites; data mining; multi-agent systems; query processing; search engines; Web document retrieval; Web genres; Web search engines; Web-based applications; agent-based focused crawling framework; ambiguous query; dynamic seed URL; focused crawler agents; genre relevance scores; genre-related Web document discovery; internal knowledge; internal mechanism; irrelevant pages; knowledge extraction; opinion mining; parallel topic-specific Web paths; topic query; topic-related Web document discovery; topic-relevant Web pages; unvisited Web pages; Crawlers; Engines; Measurement; Uniform resource locators; Vectors; Web pages; focused crawling; genre-aware crawling; link analysis; utility-based agents; web document discovery;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4799-0227-9
DOI :
10.1109/ICTAI.2012.75