Title :
Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications
Author :
Mirtaheri, Seyed M. ; Di Zou ; Bochmann, Gregor V. ; Jourdan, Guy-Vincent ; Onut, Iosif Viorel
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada
Abstract :
Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, as old as the web itself. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only makes the problem of crawling RIA more time consuming to the web crawler. To reduce the time to crawl a RIA, this paper presents a new distributed algorithm to crawl a RIA in parallel with multiple computers, called Dist-RIA Crawler. Dist-RIA Crawler uses the JavaScript events in the DOM structure to partition the search space. This paper illustrates a prototype implementation of Dist-RIA Crawler and inspect empirical performance measurements.
Keywords :
Internet; Java; indexing; parallel algorithms; security of data; software performance evaluation; DOM structure; Dist-RIA Crawler; JavaScript events; RIA crawling; Web applications; Web crawler; distributed algorithm; empirical performance measurements; indexing; partial document object model; rich Internet applications; search space partitioning; security assessment; Browsers; Crawlers; Internet; Prototypes; Random access memory; Security; Servers; Distributed Systems; Parallel Processing; Rich Internet Applications; Web Crawling;
Conference_Titel :
P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on
Conference_Location :
Compiegne
DOI :
10.1109/3PGCIC.2013.22