• DocumentCode
    519273
  • Title

    Automatic audio indexing alignment for Thai broadcast news

  • Author

    Tantibundhit, C. ; Jarasboonpaisan, T. ; Natenee, A. ; Thatphithakkul, N. ; Saykhum, K.

  • Author_Institution
    MedIntelligence & Innovation Lab., Thammasat Univ., Pathumthani, Thailand
  • fYear
    2010
  • fDate
    19-21 May 2010
  • Firstpage
    1094
  • Lastpage
    1098
  • Abstract
    We compare the recognition rate of three language models (LM)-large vocabulary continuous speech recognition (LVCSR), interpolated LVCSR, and N-gram, respectively-for automatic audio indexing alignment for Thai broadcast news. Fifty news clips across ten news categories were collected from MCOT. The audio clips are retrieved and used as the input to those three recognition systems. The recognized words are compared with the available original transcription. The experimental results show that the N-gram gives highest percentage of word correction (without regard to time alignment), followed by the interpolated LVCSR , and the LVCSR, i.e., 68.55%, 43.94%, and 31.24%, respectively. When considering time alignment of words correctly recognized at 0.10 sec error alignment, the N-gram gives highest percent word correction with 60.56%, followed by the interpolated LVCSR with 38.59%, and LVCSR with 27.29%, respectively. Word landmark technique is manipulated to align words incorrectly recognized and can improve the alignment to 89.60% for the N-gram, 83.15% for the interpolated LVCSR, and 67.86% for the LVCSR at 0.10 sec error alignment, respectively.
  • Keywords
    audio signal processing; indexing; natural language processing; speech recognition; vocabulary; word processing; N-gram; automatic audio indexing alignment; broadcast news; error alignment; interpolated large vocabulary continuous speech recognition; language models; recognition system; time alignment; word correction; word landmark; Automatic speech recognition; Broadcast technology; Indexing; Multimedia communication; Natural languages; Speech recognition; Streaming media; TV broadcasting; Technological innovation; Vocabulary; LVCSR; N-gram; audio indexing alignment; broadcast news; interpolated LVCSR; word landmark;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on
  • Conference_Location
    Chaing Mai
  • Print_ISBN
    978-1-4244-5606-2
  • Electronic_ISBN
    978-1-4244-5607-9
  • Type

    conf

  • Filename
    5491645