• DocumentCode
    178729
  • Title

    Multi-stream combination for LVCSR and keyword search on GPU-accelerated platforms

  • Author

    Wonkyum Lee ; Jungsuk Kim ; Lane, Ian

  • Author_Institution
    Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    3296
  • Lastpage
    3300
  • Abstract
    In this paper, we explore methods for system combination of acoustic models having different features, modeling approaches and phonetic decision trees for speech recognition and keyword search. We introduce a Graphic Processing Unit (GPU)-accelerated lattice generation method and show that this architecture is efficient and well suited for multi-stream acoustic model combination. Additionally, we introduce a novel method to combine acoustic models with different phonetic trees into a single fully composed HMM state level (H-level) WFST network allowing lattice generation to be performed using diverse acoustic models. We evaluate the performance of our multi-stream approach to three standard techniques and observe that multi-stream combination obtains higher speech recognition accuracy than Lattice Combination or ROVER (up to 5.5% relative improvement in speech recognition accuracy compared to the single best model). Additionally, at an equivalent runtime, multi-stream combination obtained a 15% higher Average Term Weighted Value (ATWV) compared to CombMNZ for the keyword search task. By combining phonetic decision tree, we obtained gain (WER reduction) from the diversity of phonetic decision tree by using more efficient tree for each acoustic model.
  • Keywords
    decision trees; graphics processing units; search problems; speech recognition; ATWV; GPU accelerated platforms; HMM state level; LVCSR; ROVER; WFST network; accelerated lattice generation method; acoustic models; average term weighted value; graphic processing unit; keyword search; lattice combination; multistream acoustic model; multistream approach; multistream combination; phonetic decision tree; phonetic decision trees; speech recognition; Acoustics; Computational modeling; Decision trees; Hidden Markov models; Keyword search; Lattices; Speech recognition; Graphics Processing Units (GPU); Keyword search; Multi-stream acoustic model combination; OpenKWS 2013; Weighted Finite State Transducer (WFST);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854210
  • Filename
    6854210