• DocumentCode
    3486236
  • Title

    Crowd-sourcing for difficult transcription of speech

  • Author

    Williams, Jason D. ; Melamed, I. Dan ; Alonso, Tirso ; Hollister, Barbara ; Wilpon, Jay

  • Author_Institution
    Shannon Lab., AT&T Labs. - Res, Florham Park, NJ, USA
  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    535
  • Lastpage
    540
  • Abstract
    Crowd-sourcing is a promising method for fast and cheap transcription of large volumes of speech data. However, this method cannot achieve the accuracy of expert transcribers on speech that is difficult to transcribe. Faced with such speech data, we developed three new methods of crowd-sourcing, which allow explicit trade-offs among precision, recall, and cost. The methods are: incremental redundancy, treating ASR as a transcriber, and using a regression model to predict transcription reliability. Even though the accuracy of individual crowd-workers is only 55% on our data, our best method achieves 90% accuracy on 93% of the utterances, using only 1.3 crowd-worker transcriptions per utterance on average. When forced to transcribe all utterances, our best method matches the accuracy of previous crowd-sourcing methods using only one third as many transcriptions. We also study the effects of various task design factors on transcription latency and accuracy, some of which have not been reported before.
  • Keywords
    regression analysis; speech processing; crowd sourcing; difficult transcription; incremental redundancy; regression model; speech transcription; task design factors; transcription latency; transcription reliability; Accuracy; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163988
  • Filename
    6163988