DocumentCode
590670
Title
Real-time semi-blind speech extraction with speaker direction tracking on Kinect
Author
Onuma, Y. ; Kamado, N. ; Saruwatari, Hiroshi ; Shikano, Kiyohiro
Author_Institution
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
fYear
2012
fDate
3-6 Dec. 2012
Firstpage
1
Lastpage
6
Abstract
In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot image information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user´s direction can be used to save the user´s first utterance. Next, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. We implement these methods using user tracking on Microsoft Kinect and evaluate it by speech recognition experiment in the real environment. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.
Keywords
robot kinematics; speaker recognition; ICA-based multichannel noise reduction; Kinect; image information; initial separation filter bank; permutation solving method; probability statistics; real-time semi-blind speech extraction; speaker direction tracking; speech recognition; spoken-dialogue robot; target speaker; Accuracy; Arrays; Noise; Real-time systems; Robots; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location
Hollywood, CA
Print_ISBN
978-1-4673-4863-8
Type
conf
Filename
6411817
Link To Document