• DocumentCode
    3167607
  • Title

    Tri-factorization learning of sub-word units with application to vocabulary acquisition

  • Author

    Sun, Meng ; Van hamme, Hugo

  • Author_Institution
    Dept. of Electr. Eng.-ESAT, Katholieke Univ. Leuven, Leuven, Belgium
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5177
  • Lastpage
    5180
  • Abstract
    In prior work, we proposed a method for vocabulary acquisition based on a co-occurrence model and non-negative matrix factorization. The vocabulary is described in terms of co-occurrence statistics of frame-level acoustic descriptions and suffers from poor scalability to larger vocabularies. Much like whole-word HMM models, there is no reuse of a sub-word units such as phone models. In this paper, we apply the co-occurrence framework to learn a set of sub-word units unsupervisedly using a matrix tri-factorization and propose a method for computing their posteriorgram and finally show vocabulary acquisition from the posteriorgram. The method outperforms our prior work in that it can learn from a smaller set of labeled data and shows a better recognition accuracy.
  • Keywords
    hidden Markov models; learning (artificial intelligence); matrix decomposition; speech recognition; statistical analysis; cooccurrence statistic model; hidden Markov models; nonnegative matrix factorization; posteriorgram; recognition accuracy; subword units; trifactorization learning; vocabulary acquisition; whole-word HMM models; Acoustics; Hidden Markov models; Probabilistic logic; Training; Training data; Vectors; Vocabulary; pattern discovery; semi-supervised learning; spectral embedding; vocabulary acquisition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289086
  • Filename
    6289086