DocumentCode :
3489732
Title :
Initial Experiments with Tamil LVCSR
Author :
Melvin Jose, J. ; Ngoc Thang Vu ; Schultz, Tanja
Author_Institution :
Dept. of Comput. Technol., Anna Univ., Chennai, India
fYear :
2012
fDate :
13-15 Nov. 2012
Firstpage :
81
Lastpage :
84
Abstract :
In this paper we present our recent efforts towards building a large vocabulary continuous speech recognizer for Tamil. We describe the text and speech corpus collected to realize this task. The data was complemented by a large amount of text data crawled from various Tamil news websites. The Tamil speech recognition system was bootstrapped using the Rapid Language Adaptation scheme which employs a multilingual phone inventory. After initialization, we built a word-based and syllable-based system with a Syllable Error Rate (SyllER) of 29.30% and 34.16%, respectively. We propose a data-driven approach to obtain better dictionary units to overcome the challenge of the agglutinative nature of Tamil. The approach produced a significant improvement of 27.20% and 15.12% relative SyllER on the test set over the syllable- and word-based systems, respectively. Our current best system has a SyllER of 17.44% on read newspaper speech.
Keywords :
Web sites; natural language processing; speech recognition; text analysis; vocabulary; SyllER; Tamil LVCSR; Tamil news Web sites; Tamil speech recognition system; large vocabulary continuous speech recognizer; multilingual phone inventory; rapid language adaptation scheme; speech corpus; syllable error rate; syllable-based system; text corpus; text data; word-based system; Dictionaries; Hidden Markov models; Merging; Speech; Speech recognition; Training; Vocabulary; Agglutinative language; LVCSR System; dictionary units; morphological complexity; multilingual bootstrap;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4673-6113-2
Electronic_ISBN :
978-0-7695-4886-9
Type :
conf
DOI :
10.1109/IALP.2012.46
Filename :
6473701
Link To Document :
بازگشت