DocumentCode :
178729
Title :
Multi-stream combination for LVCSR and keyword search on GPU-accelerated platforms
Author :
Wonkyum Lee ; Jungsuk Kim ; Lane, Ian
Author_Institution :
Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
3296
Lastpage :
3300
Abstract :
In this paper, we explore methods for system combination of acoustic models having different features, modeling approaches and phonetic decision trees for speech recognition and keyword search. We introduce a Graphic Processing Unit (GPU)-accelerated lattice generation method and show that this architecture is efficient and well suited for multi-stream acoustic model combination. Additionally, we introduce a novel method to combine acoustic models with different phonetic trees into a single fully composed HMM state level (H-level) WFST network allowing lattice generation to be performed using diverse acoustic models. We evaluate the performance of our multi-stream approach to three standard techniques and observe that multi-stream combination obtains higher speech recognition accuracy than Lattice Combination or ROVER (up to 5.5% relative improvement in speech recognition accuracy compared to the single best model). Additionally, at an equivalent runtime, multi-stream combination obtained a 15% higher Average Term Weighted Value (ATWV) compared to CombMNZ for the keyword search task. By combining phonetic decision tree, we obtained gain (WER reduction) from the diversity of phonetic decision tree by using more efficient tree for each acoustic model.
Keywords :
decision trees; graphics processing units; search problems; speech recognition; ATWV; GPU accelerated platforms; HMM state level; LVCSR; ROVER; WFST network; accelerated lattice generation method; acoustic models; average term weighted value; graphic processing unit; keyword search; lattice combination; multistream acoustic model; multistream approach; multistream combination; phonetic decision tree; phonetic decision trees; speech recognition; Acoustics; Computational modeling; Decision trees; Hidden Markov models; Keyword search; Lattices; Speech recognition; Graphics Processing Units (GPU); Keyword search; Multi-stream acoustic model combination; OpenKWS 2013; Weighted Finite State Transducer (WFST);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854210
Filename :
6854210
Link To Document :
بازگشت