DocumentCode :
2789054
Title :
Using the Amazon Mechanical Turk for transcription of spoken language
Author :
Marge, Matthew ; Banerjee, Satanjeev ; Rudnicky, Alexander I.
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5270
Lastpage :
5273
Abstract :
We investigate whether Amazon´s Mechanical Turk (MTurk) service can be used as a reliable method for transcription of spoken language data. Utterances with varying speaker demographics (native and non-native English, male and female) were posted on the MTurk marketplace together with standard transcription guidelines. Transcriptions were compared against transcriptions carefully prepared in-house through conventional (manual) means. We found that transcriptions from MTurk workers were generally quite accurate. Further, when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript rivaled that observed for conventional transcription methods. We also found that accuracy is not particularly sensitive to payment amount, implying that high quality results can be obtained at a fraction of the cost and turnaround time of conventional methods.
Keywords :
linguistics; natural language processing; speech processing; Amazon mechanical turk service; MTurk marketplace; Rover voting scheme; speaker demographics; spoken language transcription; transcription guideline; Computer science; Costs; Demography; Gold; Guidelines; Humans; Natural languages; Recruitment; Speech synthesis; Streaming media; crowd sourcing; speech transcription;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5494979
Filename :
5494979
Link To Document :
بازگشت