• DocumentCode
    3166333
  • Title

    ProfLifeLog: Environmental analysis and keyword recognition for naturalistic daily audio streams

  • Author

    Sangwan, Abhijeet ; Ziaei, Ali ; Hansen, John H L

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Texas at Dallas, Richardson, TX, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4941
  • Lastpage
    4944
  • Abstract
    This study presents keyword recognition evaluation on a new corpus named ProfLifeLog. ProfLifeLog is a collection of data captured on a portable audio recording device called the LENA unit. Each session in ProfLifeLog consists of 10+ hours of continuous audio recording that captures the work day of the speaker (person wearing the LENA unit). This study presents keyword spotting evaluation on the ProfLifeLog corpus using the PCN-KWS (phone confusion network-keyword spotting) algorithm [2]. The ProfLifeLog corpus contains speech data in a variety of noise backgrounds which is challenging for keyword recognition. In order to improve keyword recognition, this study also develops a front-end environment estimation strategy that uses the knowledge of speech-pause decisions and SNR (signal-to-noise ratio) to provide noise robustness. The combination of the PCN-KWS and the proposed front-end technique is evaluated on 1 hour of ProfLifeLog corpus. Our evaluation experiments demonstrate the effectiveness of the proposed technique as the number of false alarms in keyword recognition are reduced considerably.
  • Keywords
    audio signal processing; audio streaming; speech recognition; LENA unit; PCN-KWS algorithm; ProfLifeLog corpus; SNR; environmental analysis; front-end environment estimation strategy; keyword recognition; naturalistic daily audio streams; phone confusion network-keyword spotting algorithm; signal-to-noise ratio; speech data; Estimation; Hidden Markov models; Lattices; Signal to noise ratio; Speech; Speech recognition; Environment Estimation; False Alarms; Keyword Spotting; Noise Robustness; Phone Confusion Networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289028
  • Filename
    6289028