DocumentCode :
1059868
Title :
Automatic Detection of Disfluency Boundaries in Spontaneous Speech of Children Using Audio–Visual Information
Author :
Yildirim, Serdar ; Narayanan, Shrikanth
Author_Institution :
Dept. of Electr. Eng. & IMSC, Univ. of Southern California, Los Angeles, CA
Volume :
17
Issue :
1
fYear :
2009
Firstpage :
2
Lastpage :
12
Abstract :
The presence of disfluencies in spontaneous speech, while poses a challenge for robust automatic recognition, also offers means for gaining additional insights into understanding a speaker´s communicative and cognitive state. This paper analyzes disfluencies in children´s spontaneous speech, in the context of spoken dialog based computer game play, and addresses the automatic detection of disfluency boundaries. Although several approaches have been proposed to detect disfluencies in speech, relatively little work has been done to utilize visual information to improve the performance and robustness of the disfluency detection system. This paper describes the use of visual information along with prosodic and language information to detect the presence of disfluencies in a child´s computer-directed speech and shows how these information sources can be integrated to increase the overall information available for disfluency detection. The experimental results on our children´s multimodal dialog corpus indicate that disfluency detection accuracy of over 80% can be obtained by utilizing audio-visual information. Specifically, results showed that the addition of visual information to prosody and language features yield relative improvements in disfluency detection error rates of 3.6% and 6.3%, respectively, for information fusion at the feature level and decision level.
Keywords :
computer games; sensor fusion; speech recognition; audio-visual information; children spontaneous speech recognition; disfluency boundary automatic detection; information fusion; multimodal dialog corpus; prosodic information; spoken dialog based computer game play; Automatic speech recognition; Computer vision; Context; Engineering profession; Error analysis; Feature extraction; Natural languages; Robustness; Speech analysis; Speech processing; Disfluency detection; feature selection; information fusion; spoken language processing; spontaneous children speech;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2008.2006728
Filename :
4740159
Link To Document :
بازگشت