DocumentCode
2037302
Title
Significance of segmentation in phoneme based Tamil speech recognition system
Author
Harish, S. ; Vijayalakshmi, P. ; Nagarajan, T.
Author_Institution
Dept. of Electron. & Commun. Eng., Rajiv Gandhi Salai, Chennai, India
Volume
3
fYear
2011
fDate
8-10 April 2011
Firstpage
212
Lastpage
215
Abstract
Over the last few decades speech recognition has evolved and matured enough to be used in commercial applications. The applications include automatic dictation software, voice dialling, voice controlled navigation and simple data entry. Automatic Speech Recognition (ASR) deals with automatic conversion of acoustic signals of an utterance into text. In this work speech recognition system for Tamil language is developed. Speech recognition requires segmentation of speech waveform into fundamental acoustic units. Word is the natural unit of speech. However, each word has to be trained individually and there cannot be any sharing of parameters among words. Hence, it is essential to have a very large training set so that all words in the vocabulary are adequately trained. Also there is a problem with memory requirement which grows linearly with the number of words. The preferred unit to overcome this constraint is phone unit. It has less number of models and they are well trained. For the current work, phone units such as monophones and triphones are considered. This work highlights the importance of the segmented speech, language model and co-articulation effect which influences the speech production. Triphone is a phone unit which considers the co-articulation effect. Monophone and triphone based speech recognition systems for Tamil are developed and their performance shows the importance of the above mentioned parameters.
Keywords
natural language processing; speech recognition; Tamil language; acoustic signals conversion; automatic dictation software; automatic speech recognition; monophones; phoneme based Tamil speech recognition system; speech waveform segmentation; triphones; voice controlled navigation; voice dialling; Accuracy; Context; Context modeling; Data models; Hidden Markov models; Speech; Speech recognition; co-articulation; language model; lexicon; segmentation; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Electronics Computer Technology (ICECT), 2011 3rd International Conference on
Conference_Location
Kanyakumari
Print_ISBN
978-1-4244-8678-6
Electronic_ISBN
978-1-4244-8679-3
Type
conf
DOI
10.1109/ICECTECH.2011.5941739
Filename
5941739
Link To Document