• DocumentCode
    1017462
  • Title

    Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection

  • Author

    Besson, Patricia ; Popovici, Vlad ; Vesin, Jean-Marc ; Thiran, Jean-Philippe ; Kunt, Murat

  • Author_Institution
    Swiss Federal Inst. of Technol. (EPFL), Lausanne
  • Volume
    10
  • Issue
    1
  • fYear
    2008
  • Firstpage
    63
  • Lastpage
    73
  • Abstract
    A method that exploits an information theoretic framework to extract optimized audio features using video information is presented. A simple measure of mutual information (MI) between the resulting audio and video features allows the detection of the active speaker among different candidates. This method involves the optimization of an Mi-based objective function. No approximation is needed to solve this optimization problem, neither for the estimation of the probability density functions (pdfs) of the features, nor for the cost function itself. The pdfs are estimated from the samples using a nonparametric approach. The challenging optimization problem is solved using a global method: the differential evolution algorithm. Two information theoretic optimization criteria are compared and their ability to extract audio features specific to speech production is discussed. Using these specific audio features, candidate video features are then classified as member of the "speaker" or "non-speaker" class, resulting in a speaker detection scheme. As a result, our method achieves a speaker detection rate of 100% on in-house test sequences, and of 85% on most commonly used sequences.
  • Keywords
    density functional theory; feature extraction; probability; speaker recognition; video signal processing; audio features; differential evolution; multimodal speaker detection; mutual information; probability density functions; speech production; video information; Bioinformatics; Cameras; Computer vision; Cost function; Data mining; Feature extraction; Mutual information; Optimization methods; Probability density function; Speech; Audio features; differential evolution; multimodal; mutual information; speaker detection; speech;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2007.911302
  • Filename
    4407814