DocumentCode
3489732
Title
Initial Experiments with Tamil LVCSR
Author
Melvin Jose, J. ; Ngoc Thang Vu ; Schultz, Tanja
Author_Institution
Dept. of Comput. Technol., Anna Univ., Chennai, India
fYear
2012
fDate
13-15 Nov. 2012
Firstpage
81
Lastpage
84
Abstract
In this paper we present our recent efforts towards building a large vocabulary continuous speech recognizer for Tamil. We describe the text and speech corpus collected to realize this task. The data was complemented by a large amount of text data crawled from various Tamil news websites. The Tamil speech recognition system was bootstrapped using the Rapid Language Adaptation scheme which employs a multilingual phone inventory. After initialization, we built a word-based and syllable-based system with a Syllable Error Rate (SyllER) of 29.30% and 34.16%, respectively. We propose a data-driven approach to obtain better dictionary units to overcome the challenge of the agglutinative nature of Tamil. The approach produced a significant improvement of 27.20% and 15.12% relative SyllER on the test set over the syllable- and word-based systems, respectively. Our current best system has a SyllER of 17.44% on read newspaper speech.
Keywords
Web sites; natural language processing; speech recognition; text analysis; vocabulary; SyllER; Tamil LVCSR; Tamil news Web sites; Tamil speech recognition system; large vocabulary continuous speech recognizer; multilingual phone inventory; rapid language adaptation scheme; speech corpus; syllable error rate; syllable-based system; text corpus; text data; word-based system; Dictionaries; Hidden Markov models; Merging; Speech; Speech recognition; Training; Vocabulary; Agglutinative language; LVCSR System; dictionary units; morphological complexity; multilingual bootstrap;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4673-6113-2
Electronic_ISBN
978-0-7695-4886-9
Type
conf
DOI
10.1109/IALP.2012.46
Filename
6473701
Link To Document