DocumentCode :
672868
Title :
Creation of Marathi speech corpus for automatic speech recognition
Author :
Gaikwad, Sameer ; Gawali, Bharti ; Mehrotra, Sanjay
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Dr. Babasaheb Ambedkar Marathwada Univ., Aurangabad, India
fYear :
2013
fDate :
25-27 Nov. 2013
Firstpage :
1
Lastpage :
5
Abstract :
This paper describes the collection of audio corpus for Marathi language. Marathi is one of the regional Indian languages. There are 12 vowels and 36 consonants present in Marathi languages. The objective of the research is to create the speech corpus which can be used for automatic speech recognition system for various domains like telephonic inquiry system, teaching tutor etc. The size of corpus collected is 28420 isolated words and 17470 sentences from around 500 speakers. The speech utterances were recorded in 16 kHz in three recording medium, a headset, desktop mounted microphone and Mobile phone. The corpus is transcripted as well as annotated and is available for recognition system.
Keywords :
audio databases; audio recording; natural languages; speech recognition; Marathi language; Marathi speech corpus creation; audio corpus; automatic speech recognition; automatic speech recognition system; consonants; desktop mounted microphone; headset; mobile phone; recording medium; regional Indian languages; speech utterances; vowels; Automatic speech recognition; Databases; Education; Labeling; Speech; Vocabulary; Annotation; Audio; CMU; Communication; Corpus; Gender; Labeling; Praat; Speakerm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
Type :
conf
DOI :
10.1109/ICSDA.2013.6709893
Filename :
6709893
Link To Document :
بازگشت