Investigating Samples Representativeness for an Online Experiment in Java Code Search

Author

Rafael M. de Mello;Kathryn T. Stolee;Guilherme H. Travassos

Author_Institution

Fed. Univ. of Rio de Janeiro, Rio de Janeiro, Brazil

fYear

2015

Firstpage

1

Lastpage

10

Abstract

Context: The results of large-scale studies in software engineering can be significantly impacted by samples´ representativeness. Diverse population sources can be used to support sampling for such studies. Goal: To compare two samples, one from the crowdsourcing platform Mechanical Turk and another from the professional social network LinkedIn, in an online experiment for evaluating the relevance of Java code snippets to programming tasks. Method: To compare the samples (subjects´ experience, programming habits) and experimental results concerned with three experimental trials. Results: LinkedIn´s subjects present significantly higher levels of experience in Java programming and programming in general than Mechanical Turk´s subjects. The experimental results revealed a significant difference between samples and suggested that LinkedIn´s subjects were more pessimistic than Mechanical Turk´s subjects despite a high level consistency in the experimental results. Conclusion: The combined use of sources of sampling can bring benefits to large scale studies in software engineering, especially when heterogeneity is desired in the population. Thus, it can be useful to investigate and characterize alternative sources of sampling for performing large-scale studies in software engineering.

Keywords

"LinkedIn","Sociology","Statistics","Programming profession","Context","Java"

Publisher

ieee

Conference_Titel

Empirical Software Engineering and Measurement (ESEM), 2015 ACM/IEEE International Symposium on

Type

conf

DOI

10.1109/ESEM.2015.7321205

Filename

7321205