Title :
Effect of Neural Network based phonetic feature segmentation in ASR
Author :
Kotwal, Mohammed Rokibul Alam ; Hassan, Foyzul ; Huda, Mohammad Nurul
Author_Institution :
Dept. of Comput. Sci. & Eng., United Int. Univ., Dhaka, Bangladesh
Abstract :
This paper describes a system for phone segmentation using phonetic features, where context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive triphone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN with longer context window in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong and Japanese Newspaper Article Sentences (JNAS) data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.
Keywords :
feature extraction; hidden Markov models; multilayer perceptrons; natural language processing; recurrent neural nets; speech processing; speech recognition; DPF based feature extraction; DPF based feature extractor; DPF context; HMM based ASR system; JNAS data; Japanese newspaper article sentences; Japanese triphthong; MLN; RNN; acoustic feature pattern; automatic speech recognition; context window; context-sensitive triphone model; distinctive phonetic feature; fluctuation suppression; hidden Markov model; mixture-set; multilayer neural network; neural network based phonetic feature segmentation; phone segmentation; recurrent neural network; segmentation performance; speech corpus; speech parameter; Context; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Vectors; distinctive phonetic feature; hidden Markov model; local features; multi-layer neural networ; recurrent neural network;
Conference_Titel :
Computer and Information Technology (ICCIT), 2013 16th International Conference on
Conference_Location :
Khulna
DOI :
10.1109/ICCITechn.2014.6997306