Title :
Web-based keyword adapted Language Modeling for Keyword Spotting
Author :
Shen, Wenzhu ; Wu, Ji ; Li, Wei
Author_Institution :
Dept. of Electron. Eng., Speech Recognition Lab. For Inf. Sci. & Technol., Beijing, China
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
Language Model (LM) constitutes one of the key components in Keyword Spotting (KWS). The rapid development of the World Wide Web (WWW) makes it an extremely large and valuable data source for LM training, but it is not optimal to use the raw transcripts from WWW due to the mismatch of content between the web corpus and the test data. This paper proposes a novel two-step data selection method based on the predefined keyword list in language modeling for keyword spotting. First we exploit the keywords to be spotted, by submitting every keyword as a independent search engine query, it retrieves web corpus that can be used directly to train a web LM (However we didn´t); Second we select the sentences with the predefined keywords from the raw web corpus. The final keyword-specific corpus selected is applied to train adaptive LM used to adapt general purpose one. Our keyword-specific LM allows the KWS task to be topic-independent, allowing the keywords to be random and irrelevant. Our experimental results show that the keyword-specific LM outperforms the one trained on the raw web corpus, while expanding the size of the web-based data corpus no longer improve the EER point of the KWS system, but improve the performance on both end of the DET (Detection Error Tradeoff) curve.
Keywords :
Internet; Web sites; query processing; search engines; text analysis; Web based keyword adapted language modeling; Web corpus; World Wide Web; detection error tradeoff curve; keyword specific corpus; keyword spotting; predefined keyword list; search engine query; two-step data selection method; Acoustics; Adaptation model; Data models; Search engines; Training; Training data; Web pages; Data Selection; Keyword Spotting; Mixture Language Model;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684898