• DocumentCode
    2176908
  • Title

    Leveraging the Web for automatically generating indexable and browsable keywords for speech files

  • Author

    Thambiratnam, Kit ; Li, Gang ; Meng, Sha ; Seide, Frank

  • Author_Institution
    Microsoft Res. Asia, Beijing, China
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    4984
  • Lastpage
    4987
  • Abstract
    This paper presents a method for generating indexable and browsable keyword metadata from ASR transcripts by leveraging the Web. Search engine queries are built from an ASR transcript and used to retrieve similar text from the Web. The keyword meta information embedded in those pages for search engines is then ranked using a mutual information criteria to derive a keyword set. The proposed method is training-free, allows phrase keyword generation, and can generate words that were not spoken in the ASR transcript, alleviating the impact of ASR out-of-vocabulary. Subjective evaluations on technical presentations demonstrate a clear preference for this approach. Additionally an objective measure of keyword generation performance is proposed and shown to be a useful guide for tuning compared to more onerous subjective evaluations.
  • Keywords
    Internet; information retrieval; search engines; speech recognition; vocabulary; ASR out-of-vocabulary; ASR transcripts; Web; automatic speech recognition; browsable keyword generation; indexable keyword generation; keyword extraction; mutual information criteria; phrase keyword generation; search engines; speech files; Equations; Mutual information; Search engines; Speech; Speech recognition; Vocabulary; Web pages; keyword extraction; keyword generation; speech recognition; tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947475
  • Filename
    5947475