Title :
Conversation dialog corpora from television and movie scripts
Author :
Nio, Lasguido ; Sakti, Sakriani ; Neubig, Graham ; Toda, Tomoki ; Nakamura, Satoshi
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan
Abstract :
Example-based dialogue systems often require natural conversation templates as examples for response generation. However, in previous work most conversation corpora have been created by hand and do not well portray actual conversations between two people. One way to overcome this problem is to record and transcribe real human-to-human conversation. However, this work is tedious and time consuming. In this work, we utilize conversation scripts from television and movies. We extract conversations from television and movie scripts from the web and perform various types of filtering. In order to ensure that the conversation is performed by two speakers, we introduce a unit of conversation called a tri-turn (a trigram conversation turn) which allow us to filter conversations with more than two speakers. In the end, our conversation corpora contains 86,719 query-response pairs that represent conversation turns performed by two speakers talking to each other.
Keywords :
filtering theory; speech processing; World Wide Web; conversation corpora; conversation dialog corpora; conversation extraction; conversation filtering; conversation scripts; example-based dialogue systems; human-to-human conversation; movie scripts; natural conversation templates; response generation; television scripts; tri-turn; trigram conversation turn; Data collection; Databases; HTML; Motion pictures; Semantics; Syntactics; TV;
Conference_Titel :
Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014 17th Oriental Chapter of the International Committee for the
DOI :
10.1109/ICSDA.2014.7051436