DocumentCode :
2974580
Title :
Hierarchical variational loopy belief propagation for multi-talker speech recognition
Author :
Rennie, Steven J. ; Hershey, John R. ; Olsen, Peder A.
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
176
Lastpage :
181
Abstract :
We present a new method for multi-talker speech recognition using a single-channel that combines loopy belief propagation and variational inference methods to control the complexity of inference. The method models each source using an HMM with a hierarchical set of acoustic states, and uses the max model to approximate how the sources interact to generate mixed data. Inference involves inferring a set of probabilistic time-frequency masks to separate the speakers. By conditioning these masks on the hierarchical acoustic states of the speakers, the fidelity and complexity of acoustic inference can be precisely controlled. Acoustic inference using the algorithm scales linearly with the number of probabilistic time-frequency masks, and temporal inference scales linearly with LM size. Results on the monaural speech separation task (SSC) data demonstrate that the presented hierarchical variational max-sum product algorithm (HVMSP) outperforms VMSP by over 2% absolute using 4 times fewer probablistic masks. HVMSP furthermore performs on-par with the MSP algorithm, which utilizes exact conditional marginal likelihoods, using 256 times less time-frequency masks.
Keywords :
belief networks; hidden Markov models; inference mechanisms; probability; speech recognition; HMM; acoustic inference; hidden Markov model; hierarchical variational loopy belief propagation; hierarchical variational max-sum product algorithm; monaural speech separation task; multitalker speech recognition; probabilistic time-frequency mask; variational inference method; Automatic speech recognition; Belief propagation; Hidden Markov models; Humans; Inference algorithms; Loudspeakers; Natural languages; Speech coding; Speech recognition; Time frequency analysis; Iroquois; Max model; Speech separation; factorial hidden Markov models; loopy belief propagation; variational inference;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373446
Filename :
5373446
Link To Document :
بازگشت