Multi-stream combination for LVCSR and keyword search on GPU-accelerated platforms

Author

Wonkyum Lee ; Jungsuk Kim ; Lane, Ian

Author_Institution

Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

3296

Lastpage

3300

Abstract

In this paper, we explore methods for system combination of acoustic models having different features, modeling approaches and phonetic decision trees for speech recognition and keyword search. We introduce a Graphic Processing Unit (GPU)-accelerated lattice generation method and show that this architecture is efficient and well suited for multi-stream acoustic model combination. Additionally, we introduce a novel method to combine acoustic models with different phonetic trees into a single fully composed HMM state level (H-level) WFST network allowing lattice generation to be performed using diverse acoustic models. We evaluate the performance of our multi-stream approach to three standard techniques and observe that multi-stream combination obtains higher speech recognition accuracy than Lattice Combination or ROVER (up to 5.5% relative improvement in speech recognition accuracy compared to the single best model). Additionally, at an equivalent runtime, multi-stream combination obtained a 15% higher Average Term Weighted Value (ATWV) compared to CombMNZ for the keyword search task. By combining phonetic decision tree, we obtained gain (WER reduction) from the diversity of phonetic decision tree by using more efficient tree for each acoustic model.

Keywords

decision trees; graphics processing units; search problems; speech recognition; ATWV; GPU accelerated platforms; HMM state level; LVCSR; ROVER; WFST network; accelerated lattice generation method; acoustic models; average term weighted value; graphic processing unit; keyword search; lattice combination; multistream acoustic model; multistream approach; multistream combination; phonetic decision tree; phonetic decision trees; speech recognition; Acoustics; Computational modeling; Decision trees; Hidden Markov models; Keyword search; Lattices; Speech recognition; Graphics Processing Units (GPU); Keyword search; Multi-stream acoustic model combination; OpenKWS 2013; Weighted Finite State Transducer (WFST);

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854210

Filename

6854210