Title :
The App Sampling Problem for App Store Mining
Author :
Martin, William ; Harman, Mark ; Yue Jia ; Sarro, Federica ; Yuanyuan Zhang
Author_Institution :
Dept. of Comput. Sci., Univ. Coll. London, London, UK
Abstract :
Many papers on App Store Mining are susceptible to the App Sampling Problem, which exists when only a subset of apps are studied, resulting in potential sampling bias. We introduce the App Sampling Problem, and study its effects on sets of user review data. We investigate the effects of sampling bias, and techniques for its amelioration in App Store Mining and Analysis, where sampling bias is often unavoidable. We mine 106,891 requests from 2,729,103 user reviews and investigate the properties of apps and reviews from 3 different partitions: the sets with fully complete review data, partially complete review data, and no review data at all. We find that app metrics such as price, rating, and download rank are significantly different between the three completeness levels. We show that correlation analysis can find trends in the data that prevail across the partitions, offering one possible approach to App Store Analysis in the presence of sampling bias.
Keywords :
data mining; smart phones; application metrics; application sampling problem; application store mining; completeness levels; correlation analysis; download rank metric; fully-complete review data; no-review data; partially-complete review data; price metric; rating metric; sampling bias; user review data; Computational modeling; Correlation; Data mining; Google; Market research; Measurement; Web pages; App Sampling Problem; App Store Analysis; sample bias; software repository mining;
Conference_Titel :
Mining Software Repositories (MSR), 2015 IEEE/ACM 12th Working Conference on
Conference_Location :
Florence
DOI :
10.1109/MSR.2015.19