• DocumentCode
    3752071
  • Title

    Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

  • Author

    Naoaki Hashimoto;Kazumasa Yamamoto;Seiichi Nakagawa

  • Author_Institution
    Department of Computer Science and Engineering, Toyohashi University of Technology, Japan
  • fYear
    2015
  • Firstpage
    27
  • Lastpage
    30
  • Abstract
    We investigated speech recognition methods for mixed speech and music that only remove music based on non-negative matrix factorization (NMF). In this paper, we introduced the Euclidean distance of logarithm spectrum DLOG as a distance measure for source separation, which may correspond to the distance measure for speech recognition, and compared it with such traditional distance measures as the Kullback-Leibler divergence and the Itakura-Saito divergence. We improved the speech recognition performance by pooling the estimated speech, the mixed sound, and clean speech to train the acoustic model. For isolated word recognition with NMF using DLOG, we obtained an improvement from the baseline. Using the Itakura-Saito divergence and the "clean, multi-condition and noise-adaptive training model", we reduced the word error rate of 54.7% relative from the case of the "multi-condition training model" on average, from 57.6% to 80.8% word recognition rate.
  • Keywords
    Decision support systems
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
  • Type

    conf

  • DOI
    10.1109/APSIPA.2015.7415319
  • Filename
    7415319