• DocumentCode
    593256
  • Title

    A novel architecture for a blog crawler

  • Author

    Madaan, R. ; Sharma, Arvind Kumar ; Dixit, Abhishek

  • Author_Institution
    Comput. Sci. & Eng., Echelon Inst. of Technol., Faridabad, India
  • fYear
    2012
  • fDate
    6-8 Dec. 2012
  • Firstpage
    452
  • Lastpage
    456
  • Abstract
    A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful source of information, a blog crawler proves to be of great help in this regard. We propose a new algorithm for blog crawler and discuss a number of related issues. Also, as the result of analysis, it has been found that our proposed blog crawler is superior to the general crawler.
  • Keywords
    Web sites; search engines; software architecture; Web page download; blog crawler architecture; blog page download; blog space; crawl boundary; information source; search engine; Blogs; Computer architecture; Conferences; Crawlers; Measurement; Search engines; Web pages; Web; blog; crawler; indexer; search engine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
  • Conference_Location
    Solan
  • Print_ISBN
    978-1-4673-2922-4
  • Type

    conf

  • DOI
    10.1109/PDGC.2012.6449863
  • Filename
    6449863