A novel architecture for a blog crawler

Author

Madaan, R. ; Sharma, Arvind Kumar ; Dixit, Abhishek

Author_Institution

Comput. Sci. & Eng., Echelon Inst. of Technol., Faridabad, India

fYear

2012

fDate

6-8 Dec. 2012

Firstpage

452

Lastpage

456

Abstract

A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful source of information, a blog crawler proves to be of great help in this regard. We propose a new algorithm for blog crawler and discuss a number of related issues. Also, as the result of analysis, it has been found that our proposed blog crawler is superior to the general crawler.

Keywords

Web sites; search engines; software architecture; Web page download; blog crawler architecture; blog page download; blog space; crawl boundary; information source; search engine; Blogs; Computer architecture; Conferences; Crawlers; Measurement; Search engines; Web pages; Web; blog; crawler; indexer; search engine;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on

Conference_Location

Solan

Print_ISBN

978-1-4673-2922-4

Type

conf

DOI

10.1109/PDGC.2012.6449863

Filename

6449863

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=593256