• DocumentCode
    2964889
  • Title

    Evaluation of smoothing techniques for language modeling in automatic filipino speech recognition

  • Author

    Ang, F.M. ; Ancheta, J.C.M.C. ; Francia, K.M.F. ; Chua, K.G.

  • Author_Institution
    Digital Signal Process. Lab., Univ. of the Philippines - Diliman, Quezon City, Philippines
  • fYear
    2012
  • fDate
    19-22 Nov. 2012
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    It is widely known that smoothing techniques are essential for n-gram-based statistical language modeling, especially in large vocabulary continuous speech recognition (LVCSR) tasks. The goal in this paper is to investigate several smoothing algorithms for n-gram models in Filipino LVCSR. The automatic speech recognition system was developed using the Janus Speech Recognition Toolkit (JRTk) of Carnegie Mellon University and Karlsruhe Institute of Technology. The language models were generated using Stanford´s language modeling toolkit, SRILM. The data consisted of approximately 60 hours of transcribed recordings of Filipino speech from several domains spoken by 156 speakers. A total of 24 systems employing different language models were fine-tuned and tested for improved performance at a base metric. An instance of the Kneser-Ney algorithm with modified-at-end counts applied to an n-gram of order 5 registered the highest word recognition accuracy at 80.9% and 81.3% for the development and evaluation tests, respectively.
  • Keywords
    natural language processing; smoothing methods; speech recognition; Janus speech recognition toolkit; Kneser-Ney algorithm; Stanford language modeling toolkit; automatic Filipino speech recognition; large vocabulary continuous speech recognition task; n gram model; smoothing algorithm; smoothing technique; statistical language modeling; transcribed recordings; word recognition accuracy; Irrigation; Filipino speech recognition; JRTk; LVCSR; language modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2012 - 2012 IEEE Region 10 Conference
  • Conference_Location
    Cebu
  • ISSN
    2159-3442
  • Print_ISBN
    978-1-4673-4823-2
  • Electronic_ISBN
    2159-3442
  • Type

    conf

  • DOI
    10.1109/TENCON.2012.6412249
  • Filename
    6412249