DocumentCode :
3125257
Title :
Social Streams Blog Crawler
Author :
Hurst, Matthew ; Maykov, Alexey
Author_Institution :
One Microsoft, Redmond, WA
fYear :
2009
fDate :
March 29 2009-April 2 2009
Firstpage :
1615
Lastpage :
1618
Abstract :
Weblogs, and other forms of social media, differ from traditional Web content in many ways. One of the most important differences is the highly temporal nature of the content. Applications that leverage social media content must, to be effective, have access to this data with minimal publication/acquisition latency. An effective Weblog crawler should satisfy the following requirements: low latency, highly scalable, high data quality and appropriate network politeness. In this paper, we outline the Weblog crawler implemented in the social streams project and summarize the challenges faced during development.
Keywords :
Web sites; search engines; Weblog; blog crawler; social media content; social streams project; Crawlers; Data engineering; Delay; Discussion forums; Feeds; HTML; Information services; Internet; Search engines; Web sites; blogs; crawling; social media; web; weblogs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
ISSN :
1084-4627
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
Type :
conf
DOI :
10.1109/ICDE.2009.146
Filename :
4812583
Link To Document :
بازگشت