Title :
Morphology-based and sub-word language modeling for Turkish speech recognition
Author :
Haşim Sak;Murat Saraçlar;Tunga Güngör
Author_Institution :
Computer Engineering, Bogazici University, Bebek, Istanbul, Turkey
fDate :
3/1/2010 12:00:00 AM
Abstract :
We explore morphology-based and sub-word language modeling approaches proposed for morphologically rich languages, and evaluate and contrast them for Turkish broadcast news transcription task. In addition, as a morphology-based model, we improve our previously proposed morphology-integrated model for automatic speech recognition. This model is built by composing the finite-state transducer of the morphological parser with a language model over lexical morphemes. This approach provides a morphology-integrated search network with an unlimited vocabulary, generating only valid word forms while reducing the out-of-vocabulary rate and hence improving the word error rate. We also analyze the effect of morpho-tactics and morphological disambiguation on the speech recognition accuracy for the morphology-integrated model. The improved morphology-integrated model performs better than statistically derived sub-word models with added benefit of generating morpho-syntactic and semantic features.
Keywords :
"Natural languages","Speech recognition","Transducers","Automatic speech recognition","Vocabulary","Broadcasting","Lattices","Error analysis","Parameter estimation","Encoding"
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
2379-190X
DOI :
10.1109/ICASSP.2010.5494927