• DocumentCode
    1290509
  • Title

    Multistage utterance verification for keyword recognition-based online spoken content retrieval

  • Author

    Park, Jeong-Sik ; Jang, Gil-Jin ; Kim, Ji-Hwan

  • Author_Institution
    Dept. of Intell. Robot Eng., Mokwon Univ., Daejeon, South Korea
  • Volume
    58
  • Issue
    3
  • fYear
    2012
  • fDate
    8/1/2012 12:00:00 AM
  • Firstpage
    1000
  • Lastpage
    1005
  • Abstract
    This paper proposes a multistage utterance verification method as a post-processing technique for online spoken content retrieval in portable electric devices. The online spoken content retrieval system analyzes spoken content in an online manner and searches speech segments of pre-defined keywords. To maintain stable performance, we propose a reliable post-processing technique that verifies whether a found utterance or a candidate keyword segment can ultimately be categorized as a keyword. The proposed method involves a two-stage procedure for utterance verification. The first stage utilizes a confidence measure based on N-best log-likelihood recognition results. In the second stage, Dynamic Time Warping (DTW) algorithm is applied to obtain a verification result. As neither of these procedures requires high computational time and intensity, both are very suitable to online retrieval in portable devices such as smartphones. To assess the proposed technique, experiments on multimedia content retrieval tasks were performed using spoken broadcast news data. The evaluation results revealed that the performance of the proposed method was superior to that of the conventional approach.
  • Keywords
    mobile handsets; multimedia communication; speech recognition; DTW algorithm; N-best log-likelihood recognition; candidate keyword segment; computational time; dynamic time warping algorithm; multimedia content retrieval task; multistage utterance verification method; online retrieval; online spoken content retrieval system; portable device; post-processing technique; predefined keyword recognition; smartphone; speech segmentation; spoken broadcast news data; Acoustics; Data models; Hidden Markov models; Multimedia communication; Reliability; Speech; Speech recognition; Confidence measure; Keywordrecognition; Spoken content retrieval; Utterance verification;
  • fLanguage
    English
  • Journal_Title
    Consumer Electronics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-3063
  • Type

    jour

  • DOI
    10.1109/TCE.2012.6311348
  • Filename
    6311348