DocumentCode
2176881
Title
Accurate transcription of broadcast news speech using multiple noisy transcribers and unsupervised reliability metrics
Author
Audhkhasi, Kartik ; Georgiou, Panayiotis ; Narayanan, Shrikanth S.
Author_Institution
Electr. Eng. Dept., Univ. of Southern California, Los Angeles, CA, USA
fYear
2011
fDate
22-27 May 2011
Firstpage
4980
Lastpage
4983
Abstract
Professional manual transcription of speech is an expensive and time consuming process. This paper focuses on the problem of combining noisy transcriptions from multiple non-expert transcribers, where the quality of work from each worker varies. Computing transcriber reliability is a difficult task in the absence of gold standard reference transcripts. Three simple metrics for quantifying this reliability without using a gold standard are proposed. We create a database of 1000 Mexican Spanish broadcast news audio clips transcribed by five transcribers each through Amazon Mechanical Turk. Combination of multiple noisy transcripts using these reliability scores improves the word error rate of the combined transcript with respect to the LDC gold standard by 8% relative, and the sentence error rate by 4.1% relative, when compared with a combination without any reliability information.
Keywords
reliability; speech processing; Amazon mechanical turk; LDC gold standard; Mexican Spanish broadcast news audio clip transcription; broadcast news speech transcription; gold standard reference transcript; multiple noisy transcriber; multiple nonexpert transcriber; noisy transcription; sentence error rate; speech professional manual transcription; unsupervised reliability metric; Indexes; Speech transcription; crowd sourcing; evaluator reliability;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location
Prague
ISSN
1520-6149
Print_ISBN
978-1-4577-0538-0
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2011.5947474
Filename
5947474
Link To Document