DocumentCode :
1782544
Title :
Bangla phonetic feature table construction for automatic speech recognition
Author :
Hassan, Foyzul ; Kotwal, Mohammed Rokibul Alam ; Huda, Mohammad Nurul
Author_Institution :
Dept. of Comput. Sci. & Eng., United Int. Univ., Dhaka, Bangladesh
fYear :
2014
fDate :
8-10 March 2014
Firstpage :
51
Lastpage :
55
Abstract :
This This research constructs a phonetic feature (PF) table for all the phonemes pronounced in Bangla (widely known as Bengali) language where the whole study is divided into two parts. In the first part, a PF table is constructed, while the second part deals with Bangla automatic speech recognition (ASR) using PFs. For Bangla language, fifty three phonemes including both vowels and consonants are considered in which the phones, k (/s/) and m (/s/), and, Y (/n/) and b (/n/) contain approximately same spectrum and hence, they share same PFs. In the PF table, twenty two PFs (Silence, Short Silence, Stop, ...) are required for representing all the Bangla phonemes. On the other hand, the second part comprised of three stages: i) first stage deals with acoustic features, mel frequency cepstral coefficients (MFCCs) extraction, ii) second stage embeds PFs extraction procedure using a multilayer neural network (MLN) and iii) the final stage integrates a triphone-based hidden Markov model (HMM) for generating the output text strings by inputting log values of twenty two dimensional PFs. In the experiments on Bangla Newspaper Article Sentences, it is observed that the PF-based ASR system provides higher word correct rate, word accuracy and sentence correct rate in comparison with the standard MFCC-based method.
Keywords :
cepstral analysis; hidden Markov models; multilayer perceptrons; natural language processing; speech recognition; Bangla automatic speech recognition; Bangla language; Bangla newspaper article sentences; Bangla phonetic feature table construction; Bengali language; HMM; MFCC extraction; MLN; PF table; PF-based ASR system; acoustic features; embeds PFs extraction procedure; mel frequency cepstral coefficient extraction; multilayer neural network; output text string generation; phonemes; sentence correct rate; triphone-based hidden Markov model; word accuracy; word correct rate; Computers; Feature extraction; Hidden Markov models; Information technology; Speech; Speech recognition; Standards; automatic speech recognition; hidden Markov model; mel frequency cepstral coefficient; multilayer neural network; phonetic feature;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (ICCIT), 2013 16th International Conference on
Conference_Location :
Khulna
Type :
conf
DOI :
10.1109/ICCITechn.2014.6997376
Filename :
6997376
Link To Document :
بازگشت