Title :
Improving Range Query Performance on Historic Web Page Data
Author :
Li, Geng ; Peng, Bo
Author_Institution :
Lab. of Comput. Networks & Distrib. Syst., Peking Univ., Beijing, China
Abstract :
This paper is about the performance of range queries on historic web page data set, i.e. requests into a data set of web pages that keeps record of historic versions of HTML data of URLs on the web for a subset of data, the URLs and the timestamps of which satisfy the query conditions. To keep track of all versions of every web URL, the data set could easily scale up to terabytes. Hence, systems providing query services to such a data set would require much computing resource. We show that in this scenario data storage layout has significant impact on query performance and propose storage design principles for performance improvement through quantitative approaches.
Keywords :
Internet; hypermedia markup languages; query processing; HTML data; historic web page data; range query performance improvement; Distributed databases; Hard disks; Indexing; Optimization; Web pages; performance optimization; storage design; web-scale data access;
Conference_Titel :
ChinaGrid Conference (ChinaGrid), 2010 Fifth Annual
Conference_Location :
Guangzhou
Print_ISBN :
978-1-4244-7543-8
Electronic_ISBN :
978-1-4244-7544-5
DOI :
10.1109/ChinaGrid.2010.28