Title :
Exploiting Tags and Social Profiles to Improve Focused Crawling
Author :
Zhang, Zhiyong ; Nasraoui, Olfa ; Zwol, Roelof Van
Abstract :
Recent years have transformed the Web from a Web of content to a Web of applications and social content. Thus, it has become crucial to be able to tap on this social aspect of the Web whenever possible, in addition to its content, particularly for focused crawling. In this paper, we present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing web sites without assuming any privileged access to the internal private databases of such websites, nor any requirement for the existence of APIs for the extraction of social data. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and OPIC crawlers, when crawling the flickr web site for two different topics.
Keywords :
Application software; Conferences; Crawlers; Data mining; Focusing; Intelligent agent; Learning systems; Multimedia databases; Web pages; Web sites; cotagging; focused crawler; page classification; profile;
Conference_Titel :
Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Milan, Italy
Print_ISBN :
978-0-7695-3801-3
Electronic_ISBN :
978-1-4244-5331-3
DOI :
10.1109/WI-IAT.2009.27