Significance of segmentation in phoneme based Tamil speech recognition system

Author

Harish, S. ; Vijayalakshmi, P. ; Nagarajan, T.

Author_Institution

Dept. of Electron. & Commun. Eng., Rajiv Gandhi Salai, Chennai, India

Volume

3

fYear

2011

fDate

8-10 April 2011

Firstpage

212

Lastpage

215

Abstract

Over the last few decades speech recognition has evolved and matured enough to be used in commercial applications. The applications include automatic dictation software, voice dialling, voice controlled navigation and simple data entry. Automatic Speech Recognition (ASR) deals with automatic conversion of acoustic signals of an utterance into text. In this work speech recognition system for Tamil language is developed. Speech recognition requires segmentation of speech waveform into fundamental acoustic units. Word is the natural unit of speech. However, each word has to be trained individually and there cannot be any sharing of parameters among words. Hence, it is essential to have a very large training set so that all words in the vocabulary are adequately trained. Also there is a problem with memory requirement which grows linearly with the number of words. The preferred unit to overcome this constraint is phone unit. It has less number of models and they are well trained. For the current work, phone units such as monophones and triphones are considered. This work highlights the importance of the segmented speech, language model and co-articulation effect which influences the speech production. Triphone is a phone unit which considers the co-articulation effect. Monophone and triphone based speech recognition systems for Tamil are developed and their performance shows the importance of the above mentioned parameters.

Keywords

natural language processing; speech recognition; Tamil language; acoustic signals conversion; automatic dictation software; automatic speech recognition; monophones; phoneme based Tamil speech recognition system; speech waveform segmentation; triphones; voice controlled navigation; voice dialling; Accuracy; Context; Context modeling; Data models; Hidden Markov models; Speech; Speech recognition; co-articulation; language model; lexicon; segmentation; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Electronics Computer Technology (ICECT), 2011 3rd International Conference on

Conference_Location

Kanyakumari

Print_ISBN

978-1-4244-8678-6

Electronic_ISBN

978-1-4244-8679-3

Type

conf

DOI

10.1109/ICECTECH.2011.5941739

Filename

5941739