DocumentCode :
2550723
Title :
Removing fillers to induce semantic classes for a Chinese dialogue system
Author :
Li, Yali ; Zhao, Xuemin ; Yan, Yonghong
Author_Institution :
ThinkIT Lab., Chinese Acad. of Sci., Beijing, China
fYear :
2010
fDate :
16-18 April 2010
Firstpage :
512
Lastpage :
516
Abstract :
In this paper, we introduced an unsupervised method to remove fillers in spoken dialogues semi-automatically based on their probability distribution and the effect of removing fillers to induce semantic classes. We conduct the unigram and bigram distribution of fillers on our Chinese voice search data and find that only using these distributions, fillers are in the first 1% of all words. We also test the semantic class induction precision before fillers removing and after fillers removing on both human-to-computer corpus and human-to-human corpus. After removing fillers, the precision grows from 81.8% to 86.9% in human-to-computer dialogues and from 58.0% to 61.9% in human-to-human dialogues.
Keywords :
interactive systems; natural language processing; probability; speech processing; Chinese dialogue system; Chinese voice search data; bigram distribution; human-to-computer corpus; human-to-computer dialogues; human-to-human corpus; human-to-human dialogues; probability distribution; removing fillers; semantic class induction precision; semantic classes; spoken dialogues; unigram distribution; Acoustics; Bleaching; Delay; Laboratories; Natural language processing; Natural languages; Probability distribution; Speech processing; Testing; Training data; fillers detection; fillers distribution; semantic class induction; spoken dialogue;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5263-7
Electronic_ISBN :
978-1-4244-5265-1
Type :
conf
DOI :
10.1109/ICIME.2010.5477931
Filename :
5477931
Link To Document :
بازگشت