DocumentCode :
179346
Title :
Improving language modeling by using distance and co-occurrence information of word-pairs and its application to LVCSR
Author :
Tze Yuang Chong ; Banchs, Rafael E. ; Eng Siong Chng ; Haizhou Li
Author_Institution :
Temasek Labs., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
4883
Lastpage :
4887
Abstract :
This paper reports our study in exploiting the distance and co-occurrence information of word-pairs to improve the n-gram language model. We used these two types of information for modeling the distant context, up to history length of ten. Also we show that the proposed model provides complementary information about the n-gram´s context that is unable to be captured by the n-gram model due to data scarcity. Evaluated on the WSJ and SWB-1 corpora, the proposed model reduced the trigram perplexity up to 11.2% and 6.5% respectively. In an N-best re-ranking task of the Aurora-4 database, our model aided a hexagram model to perform ~9% relatively better in terms of WER.
Keywords :
natural language processing; speech recognition; Aurora-4 database; LVCSR; SWB-1 corpora; WSJ corpora; data scarcity; hexagram model; n-gram language modeling improvement; natural language processing tasks; speech recognition; word-pairs co-occurrence information; word-pairs distance information; Adaptation models; Computational modeling; Context; Context modeling; Hidden Markov models; History; Speech recognition; Term-distance; language model; speech recognition; term-occurrence;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854530
Filename :
6854530
Link To Document :
بازگشت