DocumentCode :
1796153
Title :
A distributed multi-tasking job scheduling mechanism for web crawlers
Author :
Cheng-Hung Tsai ; Tsun Ku ; Ping-Yen Yang ; Ming-Jen Chen
Author_Institution :
Inst. for Inf. Ind., Innovative DigiTech-Enabled Applic. & Service Inst., Taipei, Taiwan
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
243
Lastpage :
248
Abstract :
Recently, the prosperity of social network nourished web services such as virtual community and web community. With the readily available social networking sites and the accessible internet, the interaction between people has become much frequently than before. Therefore, this research aims to provide assist to social networking sites by collecting and analyzing the enormous data on these sites. Currently there are many scholars putting effort on the research of data collection. We focus on the difficulties and challenges which web crawlers will encounterduring data collection and improve the mechanism proposed in [6]. In this paper, we will look deep in to the blocking of social networking sites and the derivative problem of idle web crawlers.
Keywords :
Web services; scheduling; social networking (online); Internet; Web Crawlers; Web community; Web services; distributed multitasking job scheduling mechanism; social networking sites; Crawlers; Data collection; Distributed databases; Schedules; Service-oriented architecture; Social network services; Uniform resource locators; Crawler Manager; Distributed Web Crawlers; Information Retrieval; Social Network; Web Ming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of
Conference_Location :
Tunis
Type :
conf
DOI :
10.1109/SOCPAR.2014.7008013
Filename :
7008013
Link To Document :
بازگشت