DocumentCode
2788221
Title
Language model adaptation using WWW documents obtained by utterance-based queries
Author
Tsiartas, Andreas ; Georgiou, Panayiotis ; Narayanan, Shrikanth
Author_Institution
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear
2010
fDate
14-19 March 2010
Firstpage
5406
Lastpage
5409
Abstract
In this paper, we consider the estimation of topic specific Language Models (LM) by exploiting documents from the World Wide Web (WWW). We focus on the quality of the generated queries and propose a novel query generation method. In contrast to the n-gram based queries used in past works, our approach relies on utterances as queries candidates. The proposed approach does not rely on any language specific information other than the initial in-domain training text. We have conducted experiments with Web texts of size 0-150 million words, and we have shown that despite not using any language specific information, the proposed approach results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
Keywords
Internet; natural language processing; query processing; text analysis; WWW documents; Web data; Web texts; Word Error Rate; World Wide Web; initial in-domain training text; keyword-based approaches; language model adaptation; language specific information; queries; utterance-based queries; Adaptation model; Automatic speech recognition; Internet; Laboratories; Natural languages; Search engines; Speech analysis; Vocabulary; Web sites; World Wide Web; Adapt language models; WWW corpora; in-domain documents; utterance queries;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location
Dallas, TX
ISSN
1520-6149
Print_ISBN
978-1-4244-4295-9
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2010.5494928
Filename
5494928
Link To Document