DocumentCode :
2176908
Title :
Leveraging the Web for automatically generating indexable and browsable keywords for speech files
Author :
Thambiratnam, Kit ; Li, Gang ; Meng, Sha ; Seide, Frank
Author_Institution :
Microsoft Res. Asia, Beijing, China
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
4984
Lastpage :
4987
Abstract :
This paper presents a method for generating indexable and browsable keyword metadata from ASR transcripts by leveraging the Web. Search engine queries are built from an ASR transcript and used to retrieve similar text from the Web. The keyword meta information embedded in those pages for search engines is then ranked using a mutual information criteria to derive a keyword set. The proposed method is training-free, allows phrase keyword generation, and can generate words that were not spoken in the ASR transcript, alleviating the impact of ASR out-of-vocabulary. Subjective evaluations on technical presentations demonstrate a clear preference for this approach. Additionally an objective measure of keyword generation performance is proposed and shown to be a useful guide for tuning compared to more onerous subjective evaluations.
Keywords :
Internet; information retrieval; search engines; speech recognition; vocabulary; ASR out-of-vocabulary; ASR transcripts; Web; automatic speech recognition; browsable keyword generation; indexable keyword generation; keyword extraction; mutual information criteria; phrase keyword generation; search engines; speech files; Equations; Mutual information; Search engines; Speech; Speech recognition; Vocabulary; Web pages; keyword extraction; keyword generation; speech recognition; tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947475
Filename :
5947475
Link To Document :
بازگشت