DocumentCode :
3744905
Title :
The 2015 sheffield system for longitudinal diarisation of broadcast media
Author :
Rosanna Milner;Oscar Saz;Salil Deena;Mortaza Doulaty;Raymond W. M. Ng;Thomas Hain
Author_Institution :
Speech and Hearing Research group, Department of Computer Science, University of Sheffield, UK
fYear :
2015
Firstpage :
632
Lastpage :
638
Abstract :
Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal diarisation allows to use knowledge from previous audio files to improve performance, but requires finding matching speakers across consecutive files. This paper describes the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge. The challenge required longitudinal diarisation of data from BBC archives, under very constrained resource settings. Our system consists of three main stages: speech activity detection using DNNs with novel adaptation and decoding methods; speaker segmentation and clustering, with adaptation of the DNN-based clustering models; and finally speaker linking to match speakers across shows. The final result on the development set of 19 shows from five different television series provided a Diarisation Error Rate of 50.77% in the diarisation and linking task.
Keywords :
"Speech","Joining processes","Training","Adaptation models","Decoding","Hidden Markov models","Density estimation robust algorithm"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404855
Filename :
7404855
Link To Document :
بازگشت