DocumentCode :
2042609
Title :
Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications
Author :
Zhang, Xi ; Kurc, Tahsin ; Saltz, Joel ; Parthasarathy, Srinivasan
Author_Institution :
Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH
fYear :
2006
fDate :
25-29 April 2006
Abstract :
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper, we present a scalable sampling implementation that supports efficient, multi-dimensional spatio-temporal sample generation on dynamic, large scale datasets stored on a storage cluster The proposed algorithm leverages Hilbert space-filling curves in order to provide an approximate linear order of multidimensional data while maintaining spatial locality. This new implementation is then bootstrapped on top of our previous implementation, which efficiently samples large datasets along a single dimension (e.g., time), thereby realizing a service for spatio-temporal sampling. We evaluate the performance of our approach comparing it to the popular R-tree based technique. The experimental results show that our approach achieves up to an order of magnitude higher efficiency and scalability
Keywords :
Hilbert spaces; curve fitting; data analysis; sampling methods; Hilbert space-filling curves; large scale data analysis; multidimensional data sampling service; multidimensional spatiotemporal sample generation; scalable sampling; Clustering algorithms; Data analysis; Data mining; Databases; Hilbert space; Large-scale systems; Linear approximation; Multidimensional systems; Sampling methods; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location :
Rhodes Island
Print_ISBN :
1-4244-0054-6
Type :
conf
DOI :
10.1109/IPDPS.2006.1639315
Filename :
1639315
Link To Document :
بازگشت