Semi-supervised learning of dialogue acts using sentence similarity based on word embeddings

Author

Xiaohao Yang ; Jia Liu ; Zhenfeng Chen ; Weilan Wu

Author_Institution

Dept. of Electron. Eng., Tsinghua Univ., Beijing, China

fYear

2014

fDate

7-9 July 2014

Firstpage

882

Lastpage

886

Abstract

This paper describes a methodology for semi-supervised learning of dialogue acts using the similarity between sentences. We suppose that the dialogue sentences with the same dialogue act are more similar in terms of semantic and syntactic information. However, previous work on sentence similarity mainly modeled a sentence as bag-of-words and then compared different groups of words using corpus-based or knowledge-based measurements of word semantic similarity. Novelly, we present a vector-space sentence representation, composed of word embeddings, that is, the related word distributed representations, and these word embeddings are organised in a sentence syntactic structure. Given the vectors of the dialogue sentences, a distance measurement can be well-defined to compute the similarity between them. Finally, a seeded k-means clustering algorithm is implemented to classify the dialogue sentences into several categories corresponding to particular dialogue acts. This constitutes the semi-supervised nature of the approach, which aims to ameliorate the reliance of the availability of annotated corpora. Experiments with Switchboard Dialog Act corpus show that classification accuracy is improved by 14%, compared to the state-of-art methods based on Support Vector Machine.

Keywords

interactive systems; learning (artificial intelligence); pattern classification; pattern clustering; word processing; Switchboard Dialog Act corpus; annotated corpora; classification accuracy improvement; dialogue acts; dialogue sentence classification; dialogue sentence similarity; distance measurement; seeded k-means clustering algorithm; semantic information; semisupervised learning; sentence syntactic structure; syntactic information; vector-space sentence representation; word distributed representations; word embeddings; Clustering algorithms; Computational linguistics; Semantics; Supervised learning; Support vector machines; Syntactics; Vectors; dialog acts; seeded k-means; sentence similarity; word embeddings;

fLanguage

English

Publisher

ieee

Conference_Titel

Audio, Language and Image Processing (ICALIP), 2014 International Conference on

Conference_Location

Shanghai

Print_ISBN

978-1-4799-3902-2

Type

conf

DOI

10.1109/ICALIP.2014.7009921

Filename

7009921