Title :
A distributed multi-tasking job scheduling mechanism for web crawlers
Author :
Cheng-Hung Tsai ; Tsun Ku ; Ping-Yen Yang ; Ming-Jen Chen
Author_Institution :
Inst. for Inf. Ind., Innovative DigiTech-Enabled Applic. & Service Inst., Taipei, Taiwan
Abstract :
Recently, the prosperity of social network nourished web services such as virtual community and web community. With the readily available social networking sites and the accessible internet, the interaction between people has become much frequently than before. Therefore, this research aims to provide assist to social networking sites by collecting and analyzing the enormous data on these sites. Currently there are many scholars putting effort on the research of data collection. We focus on the difficulties and challenges which web crawlers will encounterduring data collection and improve the mechanism proposed in [6]. In this paper, we will look deep in to the blocking of social networking sites and the derivative problem of idle web crawlers.
Keywords :
Web services; scheduling; social networking (online); Internet; Web Crawlers; Web community; Web services; distributed multitasking job scheduling mechanism; social networking sites; Crawlers; Data collection; Distributed databases; Schedules; Service-oriented architecture; Social network services; Uniform resource locators; Crawler Manager; Distributed Web Crawlers; Information Retrieval; Social Network; Web Ming;
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of
Conference_Location :
Tunis
DOI :
10.1109/SOCPAR.2014.7008013