DocumentCode :
1059667
Title :
An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation
Author :
Sethy, Abhinav ; Georgiou, Panayiotis G. ; Ramabhadran, Bhuvana ; Narayanan, Shrikanth
Author_Institution :
Signal & Image Process. Inst., Univ. of Southern California, Los Angeles, CA
Volume :
17
Issue :
1
fYear :
2009
Firstpage :
13
Lastpage :
23
Abstract :
Performance of statistical n-gram language models depends heavily on the amount of training text material and the degree to which the training text matches the domain of interest. The language modeling community is showing a growing interest in using large collections of text (obtainable, for example, from a diverse set of resources on the Internet) to supplement sparse in-domain resources. However, in most cases the style and content of the text harvested from the web differs significantly from the specific nature of these domains. In this paper, we present a relative entropy based method to select subsets of sentences whose n-gram distribution matches the domain of interest. We present results on language model adaptation using two speech recognition tasks: a medium vocabulary medical domain doctor-patient dialog system and a large vocabulary transcription system for European parliamentary plenary speeches (EPPS). We show that the proposed subset selection scheme leads to performance improvements over state of the art speech recognition systems in terms of both speech recognition word error rate (WER) and language model perplexity (PPL).
Keywords :
speech processing; European parliamentary plenary speeches; data selection; iterative relative entropy; language model perplexity; n-gram model; text material; word error rate; Adaptation model; Entropy; Error analysis; Internet; Iterative methods; Natural language processing; Natural languages; Probability; Speech recognition; Vocabulary; Data selection; language model adaptation; relative entropy;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2008.2006654
Filename :
4740141
Link To Document :
بازگشت