MT-based artificial hypothesis generation for unsupervised discriminative language modeling

Author

Erinç Dikici;Murat Saraçlar

Author_Institution

Bogazici University, Department of Electrical and Electronics Engineering, 34342, Bebek, Istanbul, Turkey

fYear

2015

Firstpage

1401

Lastpage

1405

Abstract

Discriminative language modeling (DLM) is used as a postprocessing step to correct automatic speech recognition (ASR) errors. Traditional DLM training requires a large number of ASR N-best lists together with their reference transcriptions. It is possible to incorporate additional text data into training via artificial hypothesis generation through confusion modeling. A weighted finite-state transducer (WFST) or a machine translation (MT) system can be used to generate the artificial hypotheses. When the reference transcriptions are not available, training can be done in an unsupervised way via a target output selection scheme. In this paper we adapt the MT-based artificial hypothesis generation approach to un-supervised discriminative language modeling, and compare it with the WFST-based setting. We achieve improvements in word error rate of up to 0.7% over the generative baseline, which is significant at p <; 0.001.

Keywords

"Training","Adaptation models","Data models","Europe","Signal processing","Speech","Manuals"

Publisher

ieee

Conference_Titel

Signal Processing Conference (EUSIPCO), 2015 23rd European

Electronic_ISBN

2076-1465

Type

conf

DOI

10.1109/EUSIPCO.2015.7362614

Filename

7362614

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3716064