DocumentCode
3713065
Title
District names speech corpus for Pakistani Languages
Author
Sahar Rauf;Asima Hameed;Tania Habib;Sarmad Hussain
Author_Institution
Center for Language Engineering, Al-Khawarizmi Institute of Compute Science, University ofEngineering and Technology, Lahore, Pakistan
fYear
2015
Firstpage
207
Lastpage
211
Abstract
This paper presents a speech corpus that is developed for Urdu automatic speech recognition (ASR) system. The corpus comprises of single word utterances fixed vocabulary consisting of district names of Pakistan. The data is recorded over a telephone channel from all over Pakistan to cover six major accents; Punjabi, Urdu, Saraiki, Pashto, Sindhi, and Balochi. The data was collected in challenging acoustic environments; the major issues were silence, background noise and alternate pronunciations, which can affect the performance of the system. In order to address these issues, comprehensive data verification and cleaning guidelines are presented. The proposed process serves as a data preprocessing step for the development of ASR, which is successfully integrated in an Urdu dialog system to provide weather information of Pakistan.
Keywords
Meteorology
Publisher
ieee
Conference_Titel
Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference
Type
conf
DOI
10.1109/ICSDA.2015.7357893
Filename
7357893
Link To Document