• DocumentCode
    3713065
  • Title

    District names speech corpus for Pakistani Languages

  • Author

    Sahar Rauf;Asima Hameed;Tania Habib;Sarmad Hussain

  • Author_Institution
    Center for Language Engineering, Al-Khawarizmi Institute of Compute Science, University ofEngineering and Technology, Lahore, Pakistan
  • fYear
    2015
  • Firstpage
    207
  • Lastpage
    211
  • Abstract
    This paper presents a speech corpus that is developed for Urdu automatic speech recognition (ASR) system. The corpus comprises of single word utterances fixed vocabulary consisting of district names of Pakistan. The data is recorded over a telephone channel from all over Pakistan to cover six major accents; Punjabi, Urdu, Saraiki, Pashto, Sindhi, and Balochi. The data was collected in challenging acoustic environments; the major issues were silence, background noise and alternate pronunciations, which can affect the performance of the system. In order to address these issues, comprehensive data verification and cleaning guidelines are presented. The proposed process serves as a data preprocessing step for the development of ASR, which is successfully integrated in an Urdu dialog system to provide weather information of Pakistan.
  • Keywords
    Meteorology
  • Publisher
    ieee
  • Conference_Titel
    Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference
  • Type

    conf

  • DOI
    10.1109/ICSDA.2015.7357893
  • Filename
    7357893