• DocumentCode
    394246
  • Title

    Training a prosody-based dialog act tagger from unlabeled data

  • Author

    Venkataraman, Anand ; Ferrer, Luciana ; Stolcke, Andreas ; Shriberg, Elizabeth

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    Dialog act tagging is an important step toward speech understanding, yet training such taggers usually requires large amounts of data labeled by linguistic experts. Here we investigate the use of unlabeled data for training HMM-based dialog act taggers. Three techniques are shown to be effective for bootstrapping a tagger from very small amounts of labeled data: iterative relabeling and retraining on unlabeled data; a dialog grammar to model dialog act context, and a model of the prosodic correlates of dialog acts. On the SPINE dialog corpus, the combined use of prosodic information and unlabeled data reduces the tagging error between 12% and 16%, compared to baseline systems using word information and various amounts of labeled data only.
  • Keywords
    grammars; hidden Markov models; iterative methods; learning (artificial intelligence); speech recognition; HMM-based dialog act taggers; SPINE dialog corpus; baseline systems; dialog act context modelling; dialog grammar; discourse function; iterative relabeling; iterative retraining; labeled data; prosodic correlates; prosodic information; prosody-based dialog act tagger training; speech recognition; speech understanding; supervised training; tagger bootstrapping; tagging error reduction; unlabeled data; word information; Context modeling; Hidden Markov models; Labeling; Laboratories; Natural languages; Speech recognition; Tagging; Training data; Vocabulary; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198770
  • Filename
    1198770