Title :
Coping with language data sparsity: Semantic head mapping of compound words
Author :
Pelemans, Joris ; Demuynck, Kris ; Van hamme, Hugo ; Wambacq, Piet
Author_Institution :
Dept. ESAT, Katholieke Univ. Leuven, Leuven, Belgium
Abstract :
In this paper we present a novel clustering technique for compound words. By mapping compounds onto their semantic heads, the technique is able to estimate n-gram probabilities for unseen compounds. We argue that compounds are well represented by their heads which allows the clustering of rare words and reduces the risk of over-generalization. The semantic heads are obtained by a two-step process which consists of constituent generation and best head selection based on corpus statistics. Experiments on Dutch read speech show that our technique is capable of correctly identifying compounds and their semantic heads with a precision of 80.25% and a recall of 85.97%. A class-based language model with compound-head clusters achieves a significant reduction in both perplexity and WER.
Keywords :
pattern clustering; probability; speech processing; speech recognition; statistics; Dutch read speech; WER; automatic speech recognition; class-based language model; clustering technique; compound word; compound-head clustering; corpus statistics; language data sparsity; n-gram probability estimation; semantic head mapping; Acoustics; Compounds; Conferences; Decision support systems; Speech; Speech processing; OOV; clustering; compounds; n-grams; sparsity;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853574