• DocumentCode
    77533
  • Title

    A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios

  • Author

    Jitong Chen ; Yuxuan Wang ; DeLiang Wang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • Volume
    22
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    1993
  • Lastpage
    2002
  • Abstract
    Speech separation can be formulated as a classification problem. In classification-based speech separation, supervised learning is employed to classify time-frequency units as either speech-dominant or noise-dominant. In very low signal-to-noise ratio (SNR) conditions, acoustic features extracted from a mixture are crucial for correct classification. In this study, we systematically evaluate a range of promising features for classification-based separation using six nonstationary noises at the low SNR level of -5 dB, which is chosen with the goal of improving human speech intelligibility in mind. In addition, we propose a new feature called multi-resolution cochleagram (MRCG). The new feature is constructed by combining four cochleagrams at different spectrotemporal resolutions in order to capture both the local and contextual information. Experimental results show that MRCG gives the best classification results among all evaluated features. In addition, our results indicate that auto-regressive moving average (ARMA) filtering, a post-processing technique for improving automatic speech recognition features, also improves many acoustic features for speech separation.
  • Keywords
    autoregressive moving average processes; feature extraction; learning (artificial intelligence); signal classification; speech intelligibility; speech processing; speech recognition; ARMA filtering; MRCG; SNR conditions; acoustic feature extraction; auto-regressive moving average; automatic speech recognition; classification-based speech separation; contextual information; human speech intelligibility; multiresolution cochleagram; noise-dominant; nonstationary noises; signal-to-noise ratios; spectrotemporal resolutions; speech-dominant; supervised learning; time-frequency units; Feature extraction; IEEE transactions; Mel frequency cepstral coefficient; Signal to noise ratio; Speech; Speech processing; ARMA filtering; classification; multi-resolution cochleagram; speech separation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2359159
  • Filename
    6905738