Title :
Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams
Author :
Silovsky, Jan ; Zdansky, Jindrich ; Nouza, Jan ; Cerva, Petr ; Prazak, Jan
Author_Institution :
Inst. of Inf. Technol. & Electron., Tech. Univ. of Liberec, Liberec, Czech Republic
Abstract :
In this paper we study the effect of incorporation of automatic transcriptions in the speaker diarization process. We aim to improve both the diarization accuracy as evaluated by standard objective measures and quality of the diarization output from user´s perspective. Although the presented approach relies on output of an automatic speech recognizer, it makes no use of lexical information. Instead, we use information about word boundaries and classification of non-speech events occurring in the processed stream. The former information is used as constraining condition for speaker change-point candidates and the latter facilitate to neglect various vocal noise sounds that carry no speaker-specific information (considering representation of the signal by cepstral features) and thus harm the speaker´s representation. The experimental evaluation of the presented approach was carried out using the COST278 multilingual broadcast news database. We demonstrate that the approach yields improvement in terms of both speaker diarization and segmentation performance measures. Furthermore, we show that the number of change-points detected within words (and not at their boundaries) is significantly reduced.
Keywords :
broadcasting; database management systems; information resources; natural language processing; pattern clustering; speaker recognition; ASR output; COST278 multilingual broadcast news database; automatic speech recognizer; broadcast streams; diarization accuracy; diarization output quality; nonspeech event classiifctaion; segmentation performance measures; speaker change-point candidates; speaker clustering; speaker diarization process; speaker segmentation; standard objective measures; vocal noise sounds; word boundaries; Covariance matrix; Databases; Smoothing methods; Speech; Speech recognition; Standards; Vectors;
Conference_Titel :
Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on
Conference_Location :
Banff, AB
Print_ISBN :
978-1-4673-4570-5
Electronic_ISBN :
978-1-4673-4571-2
DOI :
10.1109/MMSP.2012.6343426