DocumentCode :
593256
Title :
A novel architecture for a blog crawler
Author :
Madaan, R. ; Sharma, Arvind Kumar ; Dixit, Abhishek
Author_Institution :
Comput. Sci. & Eng., Echelon Inst. of Technol., Faridabad, India
fYear :
2012
fDate :
6-8 Dec. 2012
Firstpage :
452
Lastpage :
456
Abstract :
A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful source of information, a blog crawler proves to be of great help in this regard. We propose a new algorithm for blog crawler and discuss a number of related issues. Also, as the result of analysis, it has been found that our proposed blog crawler is superior to the general crawler.
Keywords :
Web sites; search engines; software architecture; Web page download; blog crawler architecture; blog page download; blog space; crawl boundary; information source; search engine; Blogs; Computer architecture; Conferences; Crawlers; Measurement; Search engines; Web pages; Web; blog; crawler; indexer; search engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
Conference_Location :
Solan
Print_ISBN :
978-1-4673-2922-4
Type :
conf
DOI :
10.1109/PDGC.2012.6449863
Filename :
6449863
Link To Document :
بازگشت