• DocumentCode
    2330801
  • Title

    Kernel topic segmentation for informal multi-party meetings and performance degradation caused by insufficient lexicon

  • Author

    Sadohara, Ken

  • Author_Institution
    Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan
  • fYear
    2010
  • fDate
    12-15 Dec. 2010
  • Firstpage
    430
  • Lastpage
    435
  • Abstract
    We herein propose a domain-independent topic segmentation algorithm for free-form multi-party meeting recordings. The advantage of the proposed algorithm is that topical and lexical knowledge, which are difficult to adapt to the target meeting before speech recognition and topic segmentation, are not required. For an errorful sequence of phonemes obtained using a continuous phoneme recognizer, the proposed algorithm exhaustively analyzes the occurrence pattern of subsequences of phonemes and partitions the sequence into segments with coherent patterns. An empirical study on the ICSI Meeting Corpus has indicated that it performs comparably to lexical-cohesion-based text segmenters applied to human transcripts. Furthermore, the performance of the text segmenters applied to LVCSR output decreases significantly when keywords are not included in the lexicon. This suggests that, for the purpose of obtaining topical structure, the phoneme sequence segmenter could be more robust than text segmenters with LVCSR.
  • Keywords
    speech recognition; Kernel topic segmentation; informal multiparty meetings; insufficient lexicon; lexical knowledge; multiparty meeting recordings; performance degradation; speech recognition; text segmenters; topical knowledge; Topic segmentation; kernel method; meeting summarization; string kernel; sub-word recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2010 IEEE
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-7904-7
  • Electronic_ISBN
    978-1-4244-7902-3
  • Type

    conf

  • DOI
    10.1109/SLT.2010.5700891
  • Filename
    5700891