Title of article :
Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Author/Authors :
Homayounpour, M.M Computer Engineering and IT Department - Amirkabir University of Technology - Tehran, Iran , Asadolahzade Kermanshahi, M Computer Engineering and IT Department - Amirkabir University of Technology - Tehran, Iran
Abstract :
Improving phoneme recognition has attracted the attention of many researchers due to its applications in
various fields of speech processing. The recent research achievements show that using deep neural network
(DNN) in speech recognition systems significantly improves the performance of these systems. There are
two phases in the DNN-based phoneme recognition systems including training and testing. Most previous
research works have attempted to improve training phases such as training algorithms, different types of
network, network architecture and feature type. However, in this work, we focus on the test phase, which is
related to the generation of phoneme sequence that is also essential to achieve a good phoneme recognition
accuracy. Past research works have used Viterbi algorithm on hidden Markov model (HMM) to generate
phoneme sequences. We address an important problem associated with this method. In order to deal with the
problem of considering geometric distribution of state duration in HMM, we use real duration probability
distribution for each phoneme with the aid of hidden semi-Markov model (HSMM). We also represent each
phoneme with only one state to simply use phoneme duration information in HSMM. Furthermore, we
investigate the performance of a post-processing method that corrects the phoneme sequence obtained from
the neural network based on our knowledge about phonemes. The experimental results obtained using the
Persian FarsDat corpus show that using the extended Viterbi algorithm on HSMM achieves phoneme
recognition accuracy improvements of 2.68% and 0.56% over the conventional methods using Gaussian
mixture model-hidden Markov models (GMM-HMMs) and Viterbi on HMM, respectively. The postprocessing
method also increases the accuracy compared to before its application.
Keywords :
Persian (Farsi) Language , Hidden Semi-Markov Model , Deep Neural Network , Extended Viterbi Algorithm , Phoneme Duration , Hidden Markov Model , Phoneme Recognition
Journal title :
Astroparticle Physics