DocumentCode :
177440
Title :
Coping with language data sparsity: Semantic head mapping of compound words
Author :
Pelemans, Joris ; Demuynck, Kris ; Van hamme, Hugo ; Wambacq, Piet
Author_Institution :
Dept. ESAT, Katholieke Univ. Leuven, Leuven, Belgium
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
141
Lastpage :
145
Abstract :
In this paper we present a novel clustering technique for compound words. By mapping compounds onto their semantic heads, the technique is able to estimate n-gram probabilities for unseen compounds. We argue that compounds are well represented by their heads which allows the clustering of rare words and reduces the risk of over-generalization. The semantic heads are obtained by a two-step process which consists of constituent generation and best head selection based on corpus statistics. Experiments on Dutch read speech show that our technique is capable of correctly identifying compounds and their semantic heads with a precision of 80.25% and a recall of 85.97%. A class-based language model with compound-head clusters achieves a significant reduction in both perplexity and WER.
Keywords :
pattern clustering; probability; speech processing; speech recognition; statistics; Dutch read speech; WER; automatic speech recognition; class-based language model; clustering technique; compound word; compound-head clustering; corpus statistics; language data sparsity; n-gram probability estimation; semantic head mapping; Acoustics; Compounds; Conferences; Decision support systems; Speech; Speech processing; OOV; clustering; compounds; n-grams; sparsity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853574
Filename :
6853574
Link To Document :
بازگشت