Bangla phonetic feature table construction for automatic speech recognition

Author

Hassan, Foyzul ; Kotwal, Mohammed Rokibul Alam ; Huda, Mohammad Nurul

Author_Institution

Dept. of Comput. Sci. & Eng., United Int. Univ., Dhaka, Bangladesh

fYear

2014

fDate

8-10 March 2014

Firstpage

51

Lastpage

55

Abstract

This This research constructs a phonetic feature (PF) table for all the phonemes pronounced in Bangla (widely known as Bengali) language where the whole study is divided into two parts. In the first part, a PF table is constructed, while the second part deals with Bangla automatic speech recognition (ASR) using PFs. For Bangla language, fifty three phonemes including both vowels and consonants are considered in which the phones, k (/s/) and m (/s/), and, Y (/n/) and b (/n/) contain approximately same spectrum and hence, they share same PFs. In the PF table, twenty two PFs (Silence, Short Silence, Stop, ...) are required for representing all the Bangla phonemes. On the other hand, the second part comprised of three stages: i) first stage deals with acoustic features, mel frequency cepstral coefficients (MFCCs) extraction, ii) second stage embeds PFs extraction procedure using a multilayer neural network (MLN) and iii) the final stage integrates a triphone-based hidden Markov model (HMM) for generating the output text strings by inputting log values of twenty two dimensional PFs. In the experiments on Bangla Newspaper Article Sentences, it is observed that the PF-based ASR system provides higher word correct rate, word accuracy and sentence correct rate in comparison with the standard MFCC-based method.

Keywords

cepstral analysis; hidden Markov models; multilayer perceptrons; natural language processing; speech recognition; Bangla automatic speech recognition; Bangla language; Bangla newspaper article sentences; Bangla phonetic feature table construction; Bengali language; HMM; MFCC extraction; MLN; PF table; PF-based ASR system; acoustic features; embeds PFs extraction procedure; mel frequency cepstral coefficient extraction; multilayer neural network; output text string generation; phonemes; sentence correct rate; triphone-based hidden Markov model; word accuracy; word correct rate; Computers; Feature extraction; Hidden Markov models; Information technology; Speech; Speech recognition; Standards; automatic speech recognition; hidden Markov model; mel frequency cepstral coefficient; multilayer neural network; phonetic feature;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer and Information Technology (ICCIT), 2013 16th International Conference on

Conference_Location

Khulna

Type

conf

DOI

10.1109/ICCITechn.2014.6997376

Filename

6997376