Title :
Prosody recognition in male infant-directed speech
Author :
Robinson-Mosher, Avram Lev ; Scassellati, Brian
Author_Institution :
Dept. of Comput. Sci., Yale Univ., New Haven, CT, USA
fDate :
28 Sept.-2 Oct. 2004
Abstract :
Robots designed to learn from and interact with humans require an intuitive method for humans to communicate with them. Normal human speech is very difficult to process, requiring many kinds of complex analysis for robots to interpret it. An intermediate method for communication is recognition of prosody, the affective content of speech. Using prosody recognition, a human interacting with a robot can reward or punish its actions by scolding or praising it. In this project, prosody recognition of male voices is performed by feature-based analysis of sound files containing short utterances, which were recorded from subjects who were directed to emulate infant-directed speech, which generally contains exaggerated prosody (Breazeal, C and Aryanada, L, 2000). The features used are extracted from the energy and pitch contours in the preprocessing stage. The classifier discriminates amongst four affective classes of speech and neutral utterances. The four classes are prohibition, attentional bids, approval, and soothing, while the neutral utterances are speech, which carries none of the above affective intents. Discrimination is performed using a multistage k-nearest neighbor classifier. The five-way single-stage classifier operates at 62.5 accuracy on the entire male speech data set, while the female single-stage classifier classifies 66.7 percent correctly. Chi-square analysis resulted in a p of less than or equal to 0.001 for each. The data seem to indicate that while female voice data may be somewhat easier to classify than male, fundamental differences that make male utterances unsuitable for classification do not exist.
Keywords :
intelligent robots; man-machine systems; natural languages; pattern classification; speech recognition; Chi-square analysis; male infant directed speech; multistage k-nearest neighbor classifier; prosody recognition; robot directed speech; Computer science; Feature extraction; Human robot interaction; Interactive systems; Laboratories; Pediatrics; Performance analysis; Speech analysis; Speech processing; Speech recognition;
Conference_Titel :
Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on
Print_ISBN :
0-7803-8463-6
DOI :
10.1109/IROS.2004.1389737