• DocumentCode
    2028869
  • Title

    Utilizing RSS Feeds for Crawling the Web

  • Author

    Adam, George ; Bouras, Christos ; Poulopoulos, Vassilis

  • Author_Institution
    Technol. Inst. Rion, Res. Acad. Comput., Patras
  • fYear
    2009
  • fDate
    24-28 May 2009
  • Firstpage
    211
  • Lastpage
    216
  • Abstract
    We present ldquoadvaRSSrdquo crawling mechanism which is created in order to support peRSSonal, a mechanism used to create personalized RSS feeds. In contrast to the common crawling mechanisms our system is focalized on fetching the latest news from the major and minor portals worldwide by utilizing their communication channels. The challenge between ldquoadvaRSSrdquo and a usual crawler is the fact that the news is produced in a random order any time of the day and thus the freshness of the offline collection can be measured even in minutes. This means that the system has to be updated with news every single time they occur. In order to achieve this we utilize the communication channels that exist on the modern architecture of the WWW and more specifically in almost every modern news portal. As the RSS feeds are used by every major and minor portal it is possible to keep our crawler up to date and retain a high freshness of the ldquooffline contentrdquo that is maintained in our systempsilas database by applying algorithms in order to observe the temporal behaviour of each RSS feed.
  • Keywords
    Internet; Web sites; information retrieval; portals; Web crawling; advaRSS crawling mechanism; communication channels; news portal; offline content; random order; temporal behaviour; Application software; Communication channels; Crawlers; Feeds; Portals; Time measurement; Uniform resource locators; Web and internet services; Web pages; World Wide Web; offline content; rss analysis; rss crawling; web crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet and Web Applications and Services, 2009. ICIW '09. Fourth International Conference on
  • Conference_Location
    Venice/Mestre
  • Print_ISBN
    978-1-4244-3851-8
  • Electronic_ISBN
    978-0-7695-3613-2
  • Type

    conf

  • DOI
    10.1109/ICIW.2009.37
  • Filename
    5072521