• DocumentCode
    2159176
  • Title

    Combining monaural source separation with Long Short-Term Memory for increased robustness in vocalist gender recognition

  • Author

    Weninger, Felix ; Durrieu, Jean-Louis ; Eyben, Florian ; Richard, Gaël ; Schuller, Björn

  • Author_Institution
    Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munich, Germany
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    2196
  • Lastpage
    2199
  • Abstract
    We present a novel and unique combination of algorithms to detect the gender of the leading vocalist in recorded popular music. Building on our previous successful approach that enhanced the harmonic parts by means of Non-Negative Matrix Factorization (NMF) for increased accuracy, we integrate on the one hand a new source separation algorithm specifically tailored to extracting the leading voice from monaural recordings. On the other hand, we introduce Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) as context-sensitive classifiers for this scenario, which have lately led to great success in Music Information Retrieval tasks. Through a combination of leading voice separation and BLSTM networks, as opposed to a baseline approach using Hidden Naive Bayes on the original recordings, the accuracy of simultaneous detection of vocal presence and vocalist gender on beat level is improved by up to 10% absolute. Furthermore, using this technique we achieve 91.6% accuracy in determining the gender of the predominant vocalist on song level, which is 4% absolute above our previous best result.
  • Keywords
    matrix decomposition; recurrent neural nets; speech recognition; BLSTM-RNN; hidden naive Bayes; long short-term memory; monaural source separation; music information retrieval tasks; nonnegative matrix factorization; short-term memory recurrent neural networks; vocalist gender recognition; Accuracy; Context; Harmonic analysis; Robustness; Source separation; Support vector machines; Training; Long Short-Term Memory; Music Information Retrieval; Non-Negative Matrix Factorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5946764
  • Filename
    5946764