DocumentCode
3305677
Title
Novel algorithms for computing medians and other quantiles of disk-resident data
Author
Fu, Lixin ; Rajasekaran, Sanguthevar
Author_Institution
Dept. of Comput. & Inf. Sci., Florida Univ., Gainesville, FL, USA
fYear
2001
fDate
2001
Firstpage
145
Lastpage
154
Abstract
In data warehousing applications, numerous OLAP queries involve the processing of holistic operations such as computing the “top N”, median, etc. Efficient implementations of these operations are hard to come by. Several algorithms have been proposed in the literature that estimate various quantiles of disk-resident data. Two such recent algorithms are based on sampling. The authors present two novel and efficient quantiling algorithms, Deterministic Bucketing (DB) and Randomized Bucketing (RB). We have analyzed the performance of DB and RE and extended the analysis of the sampling done in prior algorithms. We have conducted extensive experiments to compare all these four algorithms. Our experimental data indicate that our new algorithms outperform prior algorithms not only in the overall run time but also in accuracy. The new algorithms can be used either as one-pass algorithms to accurately estimate quantiles or as algorithms for computing the quantiles exactly
Keywords
data mining; data warehouses; sampling methods; Deterministic Bucketing; OLAP queries; Randomized Bucketing; data warehousing applications; disk-resident data; holistic operations; medians; one-pass algorithms; quantiles; quantiling algorithms; sampling; Algorithm design and analysis; Partitioning algorithms; Performance analysis; Phase change random access memory; Sampling methods; Sorting; Upper bound; Warehousing;
fLanguage
English
Publisher
ieee
Conference_Titel
Database Engineering and Applications, 2001 International Symposium on.
Conference_Location
Grenoble
Print_ISBN
0-7695-1140-6
Type
conf
DOI
10.1109/IDEAS.2001.938081
Filename
938081
Link To Document