Title :
A novel architecture for a blog crawler
Author :
Madaan, R. ; Sharma, Arvind Kumar ; Dixit, Abhishek
Author_Institution :
Comput. Sci. & Eng., Echelon Inst. of Technol., Faridabad, India
Abstract :
A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful source of information, a blog crawler proves to be of great help in this regard. We propose a new algorithm for blog crawler and discuss a number of related issues. Also, as the result of analysis, it has been found that our proposed blog crawler is superior to the general crawler.
Keywords :
Web sites; search engines; software architecture; Web page download; blog crawler architecture; blog page download; blog space; crawl boundary; information source; search engine; Blogs; Computer architecture; Conferences; Crawlers; Measurement; Search engines; Web pages; Web; blog; crawler; indexer; search engine;
Conference_Titel :
Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
Conference_Location :
Solan
Print_ISBN :
978-1-4673-2922-4
DOI :
10.1109/PDGC.2012.6449863