• DocumentCode
    3162123
  • Title

    Constructing ensembles of dissimilar acoustic models using hidden attributes of training data

  • Author

    Fukuda, Takashi ; Tachibana, Ryuki ; Chaudhari, Upendra ; Ramabhadran, Bhuvana ; Zhan, Puming

  • Author_Institution
    IBM Res. - Tokyo, IBM Japan Ltd., Tokyo, Japan
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4141
  • Lastpage
    4144
  • Abstract
    One of the objectives in acoustic modeling is to realize robust statistical models against the wide variety of acoustic conditions that are present in real world environments. As large amounts of training data become available, modeling subsets of the data with similar acoustic qualities can be done accurately and multiple acoustic models are jointly used as a form of system combination or model selection. In this paper, we propose a method to partition the training data for constructing ensembles of acoustic models using metadata attributes such as SNR, speaking rate, and duration via a binary tree. The metadata attribute used at each binary split in the decision tree is obtained using a metric proposed in this paper that is cosine-similarity based. The resulting multiple models are combined using voting techniques such as n-best ROVER. The proposed method improved the recognition accuracy by up to 4% relative over the state-of-the-art system on a large vocabulary continuous speech recognition voice search task.
  • Keywords
    decision trees; speech recognition; vocabulary; acoustic modeling; acoustic qualities; binary tree; continuous speech recognition; cosine-similarity; decision tree; dissimilar acoustic models; ensemble construction; hidden attributes; large vocabulary; n-best ROVER; recognition accuracy; robust statistical models; speaking rate; training data; voice search task; voting techniques; Abstracts; Acoustics; Hidden Markov models; Indexes; Measurement; Nickel; Tin; Automatic speech recognition; large corpora; multiple acoustic modeling; system combination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288830
  • Filename
    6288830