• DocumentCode
    1782544
  • Title

    Bangla phonetic feature table construction for automatic speech recognition

  • Author

    Hassan, Foyzul ; Kotwal, Mohammed Rokibul Alam ; Huda, Mohammad Nurul

  • Author_Institution
    Dept. of Comput. Sci. & Eng., United Int. Univ., Dhaka, Bangladesh
  • fYear
    2014
  • fDate
    8-10 March 2014
  • Firstpage
    51
  • Lastpage
    55
  • Abstract
    This This research constructs a phonetic feature (PF) table for all the phonemes pronounced in Bangla (widely known as Bengali) language where the whole study is divided into two parts. In the first part, a PF table is constructed, while the second part deals with Bangla automatic speech recognition (ASR) using PFs. For Bangla language, fifty three phonemes including both vowels and consonants are considered in which the phones, k (/s/) and m (/s/), and, Y (/n/) and b (/n/) contain approximately same spectrum and hence, they share same PFs. In the PF table, twenty two PFs (Silence, Short Silence, Stop, ...) are required for representing all the Bangla phonemes. On the other hand, the second part comprised of three stages: i) first stage deals with acoustic features, mel frequency cepstral coefficients (MFCCs) extraction, ii) second stage embeds PFs extraction procedure using a multilayer neural network (MLN) and iii) the final stage integrates a triphone-based hidden Markov model (HMM) for generating the output text strings by inputting log values of twenty two dimensional PFs. In the experiments on Bangla Newspaper Article Sentences, it is observed that the PF-based ASR system provides higher word correct rate, word accuracy and sentence correct rate in comparison with the standard MFCC-based method.
  • Keywords
    cepstral analysis; hidden Markov models; multilayer perceptrons; natural language processing; speech recognition; Bangla automatic speech recognition; Bangla language; Bangla newspaper article sentences; Bangla phonetic feature table construction; Bengali language; HMM; MFCC extraction; MLN; PF table; PF-based ASR system; acoustic features; embeds PFs extraction procedure; mel frequency cepstral coefficient extraction; multilayer neural network; output text string generation; phonemes; sentence correct rate; triphone-based hidden Markov model; word accuracy; word correct rate; Computers; Feature extraction; Hidden Markov models; Information technology; Speech; Speech recognition; Standards; automatic speech recognition; hidden Markov model; mel frequency cepstral coefficient; multilayer neural network; phonetic feature;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology (ICCIT), 2013 16th International Conference on
  • Conference_Location
    Khulna
  • Type

    conf

  • DOI
    10.1109/ICCITechn.2014.6997376
  • Filename
    6997376