Title :
Data-driven lexicon refinement using local and web resources for Chinese speech recognition
Author :
Zhang, Hua ; Zhu, Xuan ; Su, Teng-Rong ; Eom, Ki-Wan ; Lee, Jae-Won
Author_Institution :
China Samsung Telecom R&D Center, Samsung Electron., Beijing, China
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
This paper proposes a data-driven lexicon refinement method. By expanding and polishing lexicon using local and web resources, accuracy of Chinese automatic speech recognition (ASR) system is boosted effectively. The proposed lexicon refining process is composed of two steps. First, an improved intra-word measure is introduced. It helps to expand lexicon from local text corpora. Second, the expanded lexicon is polished by enumerating the popularity of appended words based on web query results via search engine. The evaluation experiments are carried out on an application of voice-enabled tourist information query system. Experimental results show that the proposed lexicon refinement method reduces character error rate (CER) by 7.9% relatively.
Keywords :
Internet; search engines; speech recognition; ASR; CER; Chinese speech recognition; automatic speech recognition; character error rate; data driven lexicon refinement; expanding lexicon; intra word measurement; lexicon refining process; local resources; polishing lexicon; search engine; voice enabled tourist information query system; web query; web resources; bi-gram measure; lexicon refinement; speech recognition; web resources;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684905