• DocumentCode
    117928
  • Title

    Re-ranking of spoken term detections using CRF-based triphone detection models

  • Author

    Sawada, Naoki ; Natori, Satoshi ; Nishizaki, Hiromitsu

  • Author_Institution
    Dept. of Educ., Univ. of Yamanashi, Kofu, Japan
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based re-ranking approach, which recomputes detection scores produced by a phoneme-based dynamic time warping (DTW) STD approach. In the re-ranking approach, we tackle STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. They train recognition error patterns such as phoneme-to-phoneme confusions on the CRF framework. Therefore, the models can detect a triphone, which is one of triphones composing a query term, with detection probability. In the experimental evaluation on the Japanese OOV test collection, the CRF-based approach alone could not outperform the conventional DTW-based approach we have already proposed; however, it worked well in the re-ranking (second-pass) process for the detections from the DTW-based approach. The CRF-based re-ranking approach made a 2.4% improvement of F-measure in the STD performance.
  • Keywords
    pattern matching; random processes; speech recognition; text analysis; CRF-based re-ranking approach; CRF-based triphone detection model; DTW-based approach; F-measure; Japanese OOV test collection; conditional random field; detection score recomputation; phoneme-based dynamic time warping; phoneme-based transcriptions; recognition error patterns; sequence labeling problem; spoken term detection; text-based matching approach; Feature extraction; Hidden Markov models; Indexes; Probability; Speech; Speech recognition; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041550
  • Filename
    7041550