Speaker-independent detection of child-directed speech

Author

Schuster, Sebastian ; Pancoast, Stephanie ; Ganjoo, Milind ; Frank, Michael C. ; Jurafsky, Dan

Author_Institution

Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA

fYear

2014

Firstpage

366

Lastpage

371

Abstract

Identifying the distinct register that adults use when speaking to children is an important task for child development research. We present a fully automatic, speaker-independent system that detects child-directed speech. The two-stage system uses diarization-style voice activation techniques to extract speech segments followed by a supervised ν-SVM classifier trained on 1582 prosodic and log Mel energy features. The system significantly improves the state of the art, detecting child-directed speech with F1 of .66 (exact boundary) and .83 (within 1 second). A feature analysis confirms the importance of F0 features (especially 3rd quartile and range) as well as new features like the variance, kurtosis, and min of log Mel energy within a frequency band.

Keywords

learning (artificial intelligence); signal classification; speech recognition; support vector machines; F0 features; F1 features; automatic speaker-independent system; child development research; child-directed speech detection; child-directed speech improvement; diarization-style voice activation techniques; exact boundary; feature analysis; frequency band; kurtosis feature; log Mel energy features; prosodic features; speech segment extraction; supervised ν-SVM classifier training; two-stage system; variance feature; Accuracy; Gold; Measurement; Noise; Speech; Support vector machines; Training; Child-directed Speech; Language Development; Prosody; Speech Analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language Technology Workshop (SLT), 2014 IEEE

Type

conf

DOI

10.1109/SLT.2014.7078602

Filename

7078602