• DocumentCode
    1148070
  • Title

    Multi-View Semi-Supervised Learning for Dialog Act Segmentation of Speech

  • Author

    Guz, Umit ; Cuendet, Sébastien ; Hakkani-Tür, Dilek ; Tur, Gokhan

  • Author_Institution
    Speech Group, Int. Comput. Sci. Inst., Berkeley, CA, USA
  • Volume
    18
  • Issue
    2
  • fYear
    2010
  • Firstpage
    320
  • Lastpage
    329
  • Abstract
    Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multi-view methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%.
  • Keywords
    learning (artificial intelligence); speech processing; statistical analysis; dialog act segmentation; lexical information; multiview semisupervised learning; prosodic information; semisupervised machine learning strategy; sentence boundary-labeled data; sentence segmentation; speech; Boosting; co-training; prosody; self-training; semi-supervised learning; sentence segmentation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2028371
  • Filename
    5173566