• DocumentCode
    2014333
  • Title

    Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams

  • Author

    Silovsky, Jan ; Zdansky, Jindrich ; Nouza, Jan ; Cerva, Petr ; Prazak, Jan

  • Author_Institution
    Inst. of Inf. Technol. & Electron., Tech. Univ. of Liberec, Liberec, Czech Republic
  • fYear
    2012
  • fDate
    17-19 Sept. 2012
  • Firstpage
    118
  • Lastpage
    123
  • Abstract
    In this paper we study the effect of incorporation of automatic transcriptions in the speaker diarization process. We aim to improve both the diarization accuracy as evaluated by standard objective measures and quality of the diarization output from user´s perspective. Although the presented approach relies on output of an automatic speech recognizer, it makes no use of lexical information. Instead, we use information about word boundaries and classification of non-speech events occurring in the processed stream. The former information is used as constraining condition for speaker change-point candidates and the latter facilitate to neglect various vocal noise sounds that carry no speaker-specific information (considering representation of the signal by cepstral features) and thus harm the speaker´s representation. The experimental evaluation of the presented approach was carried out using the COST278 multilingual broadcast news database. We demonstrate that the approach yields improvement in terms of both speaker diarization and segmentation performance measures. Furthermore, we show that the number of change-points detected within words (and not at their boundaries) is significantly reduced.
  • Keywords
    broadcasting; database management systems; information resources; natural language processing; pattern clustering; speaker recognition; ASR output; COST278 multilingual broadcast news database; automatic speech recognizer; broadcast streams; diarization accuracy; diarization output quality; nonspeech event classiifctaion; segmentation performance measures; speaker change-point candidates; speaker clustering; speaker diarization process; speaker segmentation; standard objective measures; vocal noise sounds; word boundaries; Covariance matrix; Databases; Smoothing methods; Speech; Speech recognition; Standards; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on
  • Conference_Location
    Banff, AB
  • Print_ISBN
    978-1-4673-4570-5
  • Electronic_ISBN
    978-1-4673-4571-2
  • Type

    conf

  • DOI
    10.1109/MMSP.2012.6343426
  • Filename
    6343426