DocumentCode
3134663
Title
A Monte Carlo sampling method for drawing representative samples from large databases
Author
Guo, Hong ; Hou, Wen-Chi ; Yan, Feng ; Zhu, Qiang
Author_Institution
Dept. of Comput. Sci., Southern Illinois Univ., Carbondale, IL, USA
fYear
2004
fDate
21-23 June 2004
Firstpage
419
Lastpage
420
Abstract
Sampling is important in areas like data mining, OLAP, selectivity estimation, clustering, etc. It has also become a necessity in social, economical, engineering, scientific, and statistical studies where databases are too large to handle. In this paper, a sampling method based on the Metropolis algorithm is proposed. Unlike the conventional uniform sampling methods, this method is able to select objects consistent with the underlying probability distribution. It is a simple, efficient, and powerful method suitable for all distributions. We have performed experiments to examine the qualities of the samples by comparing their statistical properties with the underlying population. The experimental results show that the samples selected by our method are bona fide representative.
Keywords
Monte Carlo methods; data mining; sampling methods; statistical databases; statistical distributions; very large databases; Metropolis algorithm; Monte Carlo sampling method; OLAP; data clustering; data mining; economical studies; engineering studies; large databases; object selection; probability distribution; representative samples; scientific studies; selectivity estimation; social studies; statistical property comparison; statistical studies; Clustering algorithms; Data engineering; Data mining; Databases; Engineering drawings; Monte Carlo methods; Power engineering and energy; Power generation economics; Probability distribution; Sampling methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
ISSN
1099-3371
Print_ISBN
0-7695-2146-0
Type
conf
DOI
10.1109/SSDM.2004.1311239
Filename
1311239
Link To Document