• DocumentCode
    1299852
  • Title

    Whisper-Island Detection Based on Unsupervised Segmentation With Entropy-Based Speech Feature Processing

  • Author

    Zhang, Chi ; Hansen, John H L

  • Author_Institution
    Electr. Eng. Dept., Univ. of Texas at Dallas, Richardson, TX, USA
  • Volume
    19
  • Issue
    4
  • fYear
    2011
  • fDate
    5/1/2011 12:00:00 AM
  • Firstpage
    883
  • Lastpage
    894
  • Abstract
    Whisper island detection is a challenging research problem which has received little attention in the research community. Effective whisper-island detection is the first step necessary to ensure engagement of effective subsequent speech processing steps to address mismatch between whisper and neutral speech production. In this paper, we propose an effective approach for detecting whisper-islands embedded within normally phonated speech via BIC/T2-BIC using a proposed 4-D feature set. Performance is assessed using our proposed multi-error score (MES), which shows that the new proposed algorithm achieves the lowest MES (11.51) to date and along with a perfect 100% correct whisper/neutral vocal effort labeling. The results show that we can correctly and precisely detect vocal effort change points (VECP) between whisper-islands and neutral speech as well as label the vocal effort of the whisper-island. The proposed feature is sensitive to the vocal effort change between whisper and neutral speech and is gender independent. The result suggests that the proposed algorithm is effective and precise for the whisper-island detection.
  • Keywords
    audio streaming; entropy; speech processing; MES; VECP; entropy-based speech feature processing; multierror score; neutral speech production; phonated speech via BIC-T2-BIC; unsupervised segmentation; vocal effort change point; whisper-island detection; whisper-neutral vocal effort labeling; $T^{2}$-BIC; Bayesian information criterion (BIC); classification; detection; segmentation; vocal effort; whisper;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2066967
  • Filename
    5551178