DocumentCode
593256
Title
A novel architecture for a blog crawler
Author
Madaan, R. ; Sharma, Arvind Kumar ; Dixit, Abhishek
Author_Institution
Comput. Sci. & Eng., Echelon Inst. of Technol., Faridabad, India
fYear
2012
fDate
6-8 Dec. 2012
Firstpage
452
Lastpage
456
Abstract
A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful source of information, a blog crawler proves to be of great help in this regard. We propose a new algorithm for blog crawler and discuss a number of related issues. Also, as the result of analysis, it has been found that our proposed blog crawler is superior to the general crawler.
Keywords
Web sites; search engines; software architecture; Web page download; blog crawler architecture; blog page download; blog space; crawl boundary; information source; search engine; Blogs; Computer architecture; Conferences; Crawlers; Measurement; Search engines; Web pages; Web; blog; crawler; indexer; search engine;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
Conference_Location
Solan
Print_ISBN
978-1-4673-2922-4
Type
conf
DOI
10.1109/PDGC.2012.6449863
Filename
6449863
Link To Document