Abstract :
Current crowdsourcing platforms such as Amazon Mechanical Turk provide an attractive solution for processing of high-volume tasks at low cost. However, problems of quality control remain a major concern. In the present work, we developed a private crowdsourcing system(PCSS) running in a intranetwork, that allow us to devise for quality control methods. For quality control, we introduce four worker selection methods: preprocessing filtering, real-time filtering, post-processing filtering, and guess processing filtering. In addition to a basic approach involving initial training or the use of gold standard data, these methods include a novel approach, utilizing collaborative filtering techniques. Furthermore, we collected a large amount of vocabulary data for natural language processing, such as voice recognition and text to speech using PCSS. The quality control methods increased accuracy by 32.4% in collecting vocabulary task. Then, we got 138 thousand vocabulary data. We found that PCSS is a practical system to collect data, and used for three years since 2011.
Keywords :
"Quality control","Crowdsourcing","Vocabulary","Speech recognition","Noise reduction","Real-time systems","Market research"