• DocumentCode
    33815
  • Title

    Self-improvement of voice interface with user-input spoken query at early stage of commercialization

  • Author

    Kwang-Ho Kim ; Donghyun Lee ; Namhyun Cho ; Hyung Jeon ; Ji-Hwan Kim

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul, South Korea
  • Volume
    59
  • Issue
    4
  • fYear
    2013
  • fDate
    Nov-13
  • Firstpage
    854
  • Lastpage
    861
  • Abstract
    This paper concerns the self-improvement of voice interface by using acoustic model re-training with user-input spoken query at early stage of commercialization, when the conventional confidence measure-based acoustic model re-training is not reliable. This paper analyzes error patterns in user-input spoken queries, categorizes these error patterns, defines a quantitative measurement for each category of error patterns and proposes a filter-based approach over this quantitative measurement. The proposed filter-based method includes four distinctive filters: filter over environmental noise level, filter over non-pitch ratio within utterance, filter over average phoneme duration function score and filter over clipped frame composition ratio. For the evaluation, the initial performance of the acoustic model was measured at 66.1% in terms of speech recognition rate. The overall performance is demonstrated as 73.8% when all of the proposed filters are applied for the re-training of the acoustic model. This result demonstrates 3.1% better recognition rate than a confidence measure-based acoustic model re-training method. Our proposed method is applicable to other data-driven classification services of consumer electronic products in other mediums (e.g. image) at their early stage of commercialization.
  • Keywords
    acoustic filters; noise (working environment); query processing; speech recognition; acoustic model re-training; average phoneme duration function score; clipped frame composition ratio; commercialization stage; consumer electronic products; data-driven classification services; distinctive filters; environmental noise level; error patterns; filter-based approach; nonpitch ratio; quantitative measurement; self-improvement; speech recognition rate; user-input spoken query; utterance; voice interface; Acoustic measurements; Data models; Speech; Speech recognition; Training data; Working environment noise;
  • fLanguage
    English
  • Journal_Title
    Consumer Electronics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-3063
  • Type

    jour

  • DOI
    10.1109/TCE.2013.6689699
  • Filename
    6689699