DocumentCode :
1755829
Title :
Bias Correction in a Small Sample from Big Data
Author :
Jianguo Lu ; Dingding Li
Author_Institution :
Sch. of Comput. Sci., Univ. of Windsor, Windsor, ON, Canada
Volume :
25
Issue :
11
fYear :
2013
fDate :
Nov. 2013
Firstpage :
2658
Lastpage :
2663
Abstract :
This paper discusses the bias problem when estimating the population size of big data such as online social networks (OSN) using uniform random sampling and simple random walk. Unlike the traditional estimation problem where the sample size is not very small relative to the data size, in big data, a small sample relative to the data size is already very large and costly to obtain. We point out that when small samples are used, there is a bias that is no longer negligible. This paper shows analytically that the relative bias can be approximated by the reciprocal of the number of collisions; thereby, a bias correction estimator is introduced. The result is further supported by both simulation studies and the real Twitter network that contains 41.7 million nodes.
Keywords :
data handling; sampling methods; social networking (online); OSN; Twitter network; bias correction estimator; big data; online social networks; population size estimation; simple random walk; uniform random sampling; Equations; Estimation; Information management; Mathematical model; Sociology; Statistics; Twitter; Big data; bias; online social networks; size estimation; small sample;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2012.220
Filename :
6378370
Link To Document :
بازگشت