• DocumentCode
    681700
  • Title

    Look who´s talking

  • Author

    D´Arca, Eleonora ; Robertson, Neil M. ; Hopgood, James

  • Author_Institution
    Heriot Watt Univ., Edinburgh, UK
  • fYear
    2013
  • fDate
    2-3 Dec. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper proposes a method to automatically detect and localise the dominant speaker in a conversation by using audio and video information. The idea is that gesturing means speaking, so we look for people hands or heads movements to infer a person is talking. In a normal conversational context with two or more people, we learn Mel-frequency cepstral coefficients (MFCC) and find how they correlate with the optical flow associated with moving pixel regions by canonical correlation analysis (CCA). In complex scenarios, this operation could be resulting in associating pixel regions to sounds which actually are not really correlated. Therefore, we also triangulate the information coming from the microphones to estimate the position of the actual audio source, narrowing down the visual space of search, hence reducing the probabilities of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real data the improvement in dominant speaker localization.
  • Keywords
    cepstral analysis; correlation methods; gesture recognition; speaker recognition; CCA; MFCC; Mel-frequency cepstral coefficient; actual audio source; audio information; canonical correlation analysis; dominant speaker detection; dominant speaker localization; microphone; optical flow; people hands movement; people heads movement; video information; voice to pixel region association;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Intelligent Signal Processing Conference 2013 (ISP 2013), IET
  • Conference_Location
    London
  • Electronic_ISBN
    978-1-84919-774-8
  • Type

    conf

  • DOI
    10.1049/cp.2013.2075
  • Filename
    6740524