• DocumentCode
    590670
  • Title

    Real-time semi-blind speech extraction with speaker direction tracking on Kinect

  • Author

    Onuma, Y. ; Kamado, N. ; Saruwatari, Hiroshi ; Shikano, Kiyohiro

  • Author_Institution
    Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
  • fYear
    2012
  • fDate
    3-6 Dec. 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot image information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user´s direction can be used to save the user´s first utterance. Next, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. We implement these methods using user tracking on Microsoft Kinect and evaluate it by speech recognition experiment in the real environment. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.
  • Keywords
    robot kinematics; speaker recognition; ICA-based multichannel noise reduction; Kinect; image information; initial separation filter bank; permutation solving method; probability statistics; real-time semi-blind speech extraction; speaker direction tracking; speech recognition; spoken-dialogue robot; target speaker; Accuracy; Arrays; Noise; Real-time systems; Robots; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
  • Conference_Location
    Hollywood, CA
  • Print_ISBN
    978-1-4673-4863-8
  • Type

    conf

  • Filename
    6411817