Title :
Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project
Author :
Kelling, Steve ; Yu, Jun ; Gerbracht, Jeff ; Wong, Weng-Keen
Author_Institution :
Cornell Lab. of Ornithology, Ithaca, NY, USA
Abstract :
Research projects that use the efforts of volunteers ("citizen scientists") to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.
Keywords :
data handling; learning (artificial intelligence); research and development; scientific information systems; zoology; automated data verification; automated quality filters; bird observations; eBird; large scale citizen science project; machine learning method; observer expertise; observer variability; outlier detection; research projects; species misidentification; Biological system modeling; Birds; Data models; Matched filters; Mathematical model; Observers; Vegetation; citizen-science; data quality; data-base filters; machine learning; species occurrence;
Conference_Titel :
e-Science Workshops (eScienceW), 2011 IEEE Seventh International Conference on
Conference_Location :
Stockholm
Print_ISBN :
978-1-4673-0026-1
DOI :
10.1109/eScienceW.2011.13