• DocumentCode
    763690
  • Title

    Model-based sequential organization in cochannel speech

  • Author

    Shao, Yang ; Wang, DeLiang

  • Author_Institution
    Dept. of Comput. Sci., Ohio State Univ., Columbus, OH, USA
  • Volume
    14
  • Issue
    1
  • fYear
    2006
  • Firstpage
    289
  • Lastpage
    298
  • Abstract
    A human listener has the ability to follow a speaker´s voice while others are speaking simultaneously; in particular, the listener can organize the time-frequency energy of the same speaker across time into a single stream. In this paper, we focus on sequential organization in cochannel speech, or mixtures of two voices. We extract minimally corrupted segments, or usable speech, in cochannel speech using a robust multipitch tracking algorithm. The extracted usable speech is shown to capture speaker characteristics and improves speaker identification (SID) performance across various target-to-interferer ratios. To utilize speaker characteristics for sequential organization, we extend the traditional SID framework to cochannel speech and derive a joint objective for sequential grouping and SID, leading to a problem of search for the optimum hypothesis. Subsequently we propose a hypothesis pruning algorithm based on speaker models in order to make the search computationally efficient. Evaluation results show that the proposed system approaches the ceiling SID performance obtained with prior pitch information and yields significant improvement over alternative approaches to sequential organization.
  • Keywords
    feature extraction; speaker recognition; speech processing; cochannel speech; extracted usable speech; model-based sequential organization; pruning algorithm; robust multipitch tracking algorithm; speaker identification; target-to-interferer ratios; time-frequency energy; Communication channels; Frequency estimation; Humans; Image analysis; Robustness; Speaker recognition; Speech analysis; Speech enhancement; Speech recognition; Target tracking; Auditory scene analysis; cochannel speech; model-based approach; sequential organization; speaker identification (SID); usable speech;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.854106
  • Filename
    1561285