CRIM and LIUM approaches for multi-genre broadcast media transcription

Author

Vishwa Gupta;Paul Del?glise;Gilles Boulianne;Yannick Est?ve;Sylvain Meignier;Anthony Rousseau

Author_Institution

Centre de recherche informatique de Montr?al (CRIM)

fYear

2015

Firstpage

681

Lastpage

686

Abstract

The Multi-Genre Broadcast Challenge at ASRU 2015 is a controlled evaluation of speech recognition, speaker diarization, and lightly supervised alignment using BBC TV recordings. CRIM and LIUM teams participated in the speech recognition part of the challenge with a joint submission. This paper presents the CRIM and LIUM´s contributions. Each team made different choices to develop its ASR system. By the way, it was expected to compare and to evaluate different approaches to diarization and acoustic modeling, and to get complementary ASR systems for effective merging. CRIM´s main contributions are the use of a training scenario similar to multi-lingual training to estimate the deep neural net (DNN) acoustic models with most of the data, the use of a pruned trigram model for search, in addition to the use of a genre-dependent quadgram language model for rescoring the lattice from the search. For LIUM, the focus was on fast decoding with high accuracy. The final word error rates (WER) after merging show that it is possible to get reasonable WER with automatically aligned files. The final global WER of 25.1% corresponds to a WER reduction of about 20% absolute in comparison to the ASR baseline system provided by the organizers.

Keywords

"Training","Acoustics","Speech","Training data","Data models","Hidden Markov models","Speech recognition"

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type

conf

DOI

10.1109/ASRU.2015.7404862

Filename

7404862