DocumentCode
663350
Title
Multimodal concept and word learning using phoneme sequences with errors
Author
Nakamura, T. ; Araki, Takeshi ; Nagai, Takayuki ; Nagasaka, Shogo ; Taniguchi, Takafumi ; Iwahashi, Naoto
Author_Institution
Dept. of Mech. Eng. & Intell. Syst., Univ. of Electro-Commun., Chofu, Japan
fYear
2013
fDate
3-7 Nov. 2013
Firstpage
157
Lastpage
162
Abstract
In this study, we propose a method for concept formation and word acquisition for robots. The proposed method is based on multimodal latent Dirichlet allocation (MLDA) and the nested Pitman-Yor language model (NPYLM). A robot obtains haptic, visual, and auditory information by grasping, observing, and shaking an object. At the same time, a user teaches object features to the robot through speech, which is recognized using only acoustic models and transformed into phoneme sequences. As the robot is supposed to have no language model in advance, the recognized phoneme sequences include many phoneme recognition errors. Moreover, the recognized phoneme sequences with errors are segmented into words in an unsupervised manner; however, not all words are necessarily segmented correctly. The words including these errors have a negative effect on the learning of word meanings. To overcome this problem, we propose a method to improve unsupervised word segmentation and to reduce phoneme recognition errors by using multimodal object concepts. In the proposed method, object concepts are used to enhance the accuracy of word segmentation, reduce phoneme recognition errors, and correct words so as to improve the categorization accuracy. We experimentally demonstrate that the proposed method can improve the accuracy of word segmentation and reduce the phoneme recognition error and that the obtained words enhance the categorization accuracy.
Keywords
robots; speech recognition; unsupervised learning; NPYLM; acoustic models; auditory information; categorization accuracy; concept formation; haptic information; multimodal concept; multimodal latent Dirichlet allocation; multimodal object concepts; nested Pitman-Yor language model; object features; object grasping; object observing; object shaking; phoneme recognition errors; phoneme sequences; robot; speech recognition; unsupervised word segmentation; visual information; word acquisition; word learning; Accuracy; Histograms; Robot sensing systems; Speech recognition; Vectors; Visualization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on
Conference_Location
Tokyo
ISSN
2153-0858
Type
conf
DOI
10.1109/IROS.2013.6696347
Filename
6696347
Link To Document