Title :
Automated data verification in a large-scale citizen science project: A case study
Author :
Jun Yu ; Kelling, S. ; Gerbracht, J. ; Weng-Keen Wong
Author_Institution :
Sch. of EECS, Oregon State Univ., Corvallis, OR, USA
Abstract :
Although citizen science projects can engage a very large number of volunteers to collect volumes of data, they are susceptible to issues with data quality. Our experience with eBird, which is a broad-scale citizen science project to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. The increasing volume of data being collected by eBird places a huge burden on these volunteer experts and other automated approaches to improve data quality are needed. In this work, we describe a case study in which we evaluate an automated data quality filter that improves data quality by identifying outliers and categorizing these outliers as either unusual valid observations or mis-identified (invalid) observations. This automated data filter involves a two-step process: first, a data-driven method detects outliers (ie. observations that are unusual for a given region and date). Next, we use a data quality model based on an observer´s predicted expertise to decide if an outlier should be flagged for review. We applied this automated data filter retrospectively to eBird data from Tompkins Co., NY and found that that this automated process significantly reduced the workload of reviewers by as much as 43% and identifies 52% more potentially invalid observations.
Keywords :
biology computing; data handling; information filtering; scientific information systems; zoology; automated data quality filter; automated data verification; bird observations; broad-scale citizen science project; data screen; data-driven method; eBird data; large-scale citizen science project; outlier detection; Biological system modeling; Birds; Data models; Databases; Mathematical model; Observers; Predictive models; Applications; Citizen Science; Crowdsourcing; Data Filters; Data Quality; Species Distribution Modeling;
Conference_Titel :
E-Science (e-Science), 2012 IEEE 8th International Conference on
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4673-4467-8
DOI :
10.1109/eScience.2012.6404472