• DocumentCode
    1798842
  • Title

    Semi-supervised learning of dialogue acts using sentence similarity based on word embeddings

  • Author

    Xiaohao Yang ; Jia Liu ; Zhenfeng Chen ; Weilan Wu

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • fYear
    2014
  • fDate
    7-9 July 2014
  • Firstpage
    882
  • Lastpage
    886
  • Abstract
    This paper describes a methodology for semi-supervised learning of dialogue acts using the similarity between sentences. We suppose that the dialogue sentences with the same dialogue act are more similar in terms of semantic and syntactic information. However, previous work on sentence similarity mainly modeled a sentence as bag-of-words and then compared different groups of words using corpus-based or knowledge-based measurements of word semantic similarity. Novelly, we present a vector-space sentence representation, composed of word embeddings, that is, the related word distributed representations, and these word embeddings are organised in a sentence syntactic structure. Given the vectors of the dialogue sentences, a distance measurement can be well-defined to compute the similarity between them. Finally, a seeded k-means clustering algorithm is implemented to classify the dialogue sentences into several categories corresponding to particular dialogue acts. This constitutes the semi-supervised nature of the approach, which aims to ameliorate the reliance of the availability of annotated corpora. Experiments with Switchboard Dialog Act corpus show that classification accuracy is improved by 14%, compared to the state-of-art methods based on Support Vector Machine.
  • Keywords
    interactive systems; learning (artificial intelligence); pattern classification; pattern clustering; word processing; Switchboard Dialog Act corpus; annotated corpora; classification accuracy improvement; dialogue acts; dialogue sentence classification; dialogue sentence similarity; distance measurement; seeded k-means clustering algorithm; semantic information; semisupervised learning; sentence syntactic structure; syntactic information; vector-space sentence representation; word distributed representations; word embeddings; Clustering algorithms; Computational linguistics; Semantics; Supervised learning; Support vector machines; Syntactics; Vectors; dialog acts; seeded k-means; sentence similarity; word embeddings;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Audio, Language and Image Processing (ICALIP), 2014 International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4799-3902-2
  • Type

    conf

  • DOI
    10.1109/ICALIP.2014.7009921
  • Filename
    7009921