• DocumentCode
    2053380
  • Title

    Audio spatio-temporal fingerprints for cloudless real-time hands-free diarization on mobile devices

  • Author

    Korchagin, Danil

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • fYear
    2011
  • fDate
    May 30 2011-June 1 2011
  • Firstpage
    25
  • Lastpage
    30
  • Abstract
    In this paper, we propose a new low bit rate representation of a sound field and a new method for the corresponding cloudless low delay hands-free diarization suitable for low-performance mobile devices, e.g. mobile phones. The proposed audio spatio-temporal fingerprint representation results in low bit rate (500 bytes/second), however contains complete information about continuous audio tracking of multiple acoustic sources in an open, unconstrained environment. The core of the algorithm is based on simultaneous multiple data stream processing using audio spatio-temporal fingerprint representation to cover higher level events relevant for diarization, e.g. turns, interruptions, crosstalk, speech and non-speech segments. Performance levels achieved to date on 5 hours of hand-labelled datasets have shown the feasibility of the approach at the same time as resulting in 7.58% CPU load on 1-core ultra-low-power mobile processor running at 1 GHz and low algorithmic delay of 112 ms.
  • Keywords
    audio signal processing; crosstalk; fingerprint identification; mobile handsets; spatiotemporal phenomena; speaker recognition; audio spatio-temporal fingerprint representation; cloudless real-time hands-free diarization; continuous audio tracking; crosstalk; delay hands-free diarization; hand-labelled dataset; low algorithmic delay; low bit rate representation; low performance mobile device; mobile phone; multiple acoustic source; multiple data stream processing; sound field representation; speech segments; ultra-low power mobile processor; unconstrained environment; Acoustics; Delay; Fingerprint recognition; Microphone arrays; Mobile handsets; Speech; Microphone arrays; array signal processing; mobile computing; source coding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hands-free Speech Communication and Microphone Arrays (HSCMA), 2011 Joint Workshop on
  • Conference_Location
    Edinburgh
  • Print_ISBN
    978-1-4577-0997-5
  • Type

    conf

  • DOI
    10.1109/HSCMA.2011.5942404
  • Filename
    5942404