Title :
Multi-class composite N-gram based on connection direction
Author :
Yamamoto, Hirojkmi ; Sagisaka, Yoshinori
Author_Institution :
ATR Interpreting Telephony Res. Labs., Kyoto, Japan
Abstract :
A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word, In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to a multi-class composite N-gram that unit is a multi-class 2-gram and joined word. The multi-class composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams
Keywords :
computational linguistics; natural languages; pattern clustering; connection direction; language corpora; multi-class 2-gram; multi-class composite N-gram; performance; perplexity; recognition rates; statistically salient class 2-grams; word neighboring characteristic; word-clustering technique; word-following direction; word-preceding direction; Equations; Natural languages; Speech recognition;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location :
Phoenix, AZ
Print_ISBN :
0-7803-5041-3
DOI :
10.1109/ICASSP.1999.758180