Learning model-based F0 production through goal-directed babbling

Author

Hao Liu ; Yi Xu

Author_Institution

Dept. of Speech, Hearing & Phonetic Sci., Univ. Coll. London, London, UK

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

284

Lastpage

288

Abstract

How surface acoustics can be mapped to underlying articulatory commands is a central yet unsolved issue about speech acquisition. Previously, stochastic optimization has been shown to be proficient in learning underlying pitch targets of the quantitative Target Approximation (qTA) model. The present study tested whether it is possible to develop an acoustic-to-articulatory inverse model for qTA by taking advantages of a recent advance in inverse kinematics learning in the field of developmental robotics, known as goal babbling. By treating traditionally separated babbling and imitation stages of speech acquisition as a unified acoustic goal-directed babbling process, the inverse model implemented by a multilayer perceptron (MLP) can be bootstrapped rapidly without the necessity of exploring the whole articulatory command space. The MLP was trained in online mode with self-generated examples obtained after every production of the host learner. The results show that with this novel learning paradigm the inverse model can be improved in a progressive manner and underlying pitch targets can be obtained by querying the mature inverse model. Our findings also demonstrate that qTA is an intrinsically robust F0 production model that can be operated by various learning regimens.

Keywords

learning (artificial intelligence); multilayer perceptrons; optimisation; speech processing; MLP; acoustic-to-articulatory inverse model; articulatory command space; articulatory commands; developmental robotics; inverse kinematics learning; learning model-based F0 production; learning regimens; multilayer perceptron; pitch targets; qTA; quantitative target approximation model; self-generated examples; speech acquisition; speech acquisition imitation stages; stochastic optimization; surface acoustics; traditionally separated babbling; unified acoustic goal-directed babbling process; Acoustics; Adaptation models; Production; Robot sensing systems; Space exploration; Speech; Stochastic processes; F0 contour modeling; acoustic-to-articulatory inversion; online learning; sensorimotor coordination; speech acquisition; speech production; target approximation;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936720

Filename

6936720