DocumentCode :
3744907
Title :
The development of the cambridge university alignment systems for the multi-genre broadcast challenge
Author :
P. Lanchantin;M. J. F. Gales;P. Karanasou;X. Liu;Y. Qian;L. Wang;P.C. Woodland;C. Zhang
Author_Institution :
Cambridge University Engineering Department, Cambridge CB2 1PZ, UK
fYear :
2015
Firstpage :
647
Lastpage :
653
Abstract :
We describe the alignment systems developed both for the preparation of data for the Multi-Genre Broadcast (MGB) challenge and for our participation in the transcription and alignment tasks. Captions of varying quality are aligned with the audio of TV shows that range from few minutes long to more than six hours. Lightly supervised decoding is performed on the audio and the output text is aligned with the original text transcript. Reliable split points are found and the resulting text chunks are force-aligned with the corresponding audio segments. Confidence scores are associated with the aligned data. Multiple refinements - including audio segmentation based on deep neural networks (DNNs) and the use of DNN-based acoustic models - were used to improve the performance. The final MGB alignment system had the highest F-measure value on the evaluation data.
Keywords :
"Training","Speech","Speech recognition","Decoding","TV","Acoustics","Reliability"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404857
Filename :
7404857
Link To Document :
بازگشت