• DocumentCode
    730749
  • Title

    Effects of feature type, learning algorithm and speaking style for depression detection from speech

  • Author

    Mitra, Vikramjit ; Shriberg, Elizabeth

  • Author_Institution
    SRI Int., Menlo Park, CA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4774
  • Lastpage
    4778
  • Abstract
    Computational methods for speech-based detection of depression are still relatively new, and have focused on either a standard set of features or on specific additional approaches. We systematically study the effects of feature type, machine learning approach, and speaking style (read versus spontaneous) on depression prediction in the AVEC-2014 evaluation corpus, using features related to speech production, perception, acoustic phonetics, and prosody. Using a multilayer ANN we find that one feature type, MMEDuSA [2], results in a 25% relative error reduction over the AVEC-2014 baseline system [1] for both mean absolute error (MAE) and root mean squared error (RMSE). Other individual feature types perform comparably to the baseline, but have much lower dimensionality and simpler to interpret. Further improvements were achieved from fusing diverse features and systems. Finally, results suggest that the relative contribution of different feature types depends on whether the speech is spontaneous or read. Overall, spontaneous speech led to lower error rates than read speech, an important consideration for the collection of future clinical data.
  • Keywords
    behavioural sciences computing; learning (artificial intelligence); medical signal detection; medical signal processing; multilayer perceptrons; speech processing; statistical analysis; AVEC-2014 evaluation corpus; MAE; MMEDuSA; RMSE; acoustic phonetics; computational methods; depression prediction; feature type; learning algorithm; machine learning approach; mean absolute error; multilayer ANN; prosody; read type; relative error reduction; root mean squared error; speaking style; speech production; speech-based depression detection; spontaneous type; Artificial neural networks; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition; Depression detection; acoustic features; articulatory features; clinical data; neural networks; prosody; robust signal analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178877
  • Filename
    7178877