DocumentCode
2176908
Title
Leveraging the Web for automatically generating indexable and browsable keywords for speech files
Author
Thambiratnam, Kit ; Li, Gang ; Meng, Sha ; Seide, Frank
Author_Institution
Microsoft Res. Asia, Beijing, China
fYear
2011
fDate
22-27 May 2011
Firstpage
4984
Lastpage
4987
Abstract
This paper presents a method for generating indexable and browsable keyword metadata from ASR transcripts by leveraging the Web. Search engine queries are built from an ASR transcript and used to retrieve similar text from the Web. The keyword meta information embedded in those pages for search engines is then ranked using a mutual information criteria to derive a keyword set. The proposed method is training-free, allows phrase keyword generation, and can generate words that were not spoken in the ASR transcript, alleviating the impact of ASR out-of-vocabulary. Subjective evaluations on technical presentations demonstrate a clear preference for this approach. Additionally an objective measure of keyword generation performance is proposed and shown to be a useful guide for tuning compared to more onerous subjective evaluations.
Keywords
Internet; information retrieval; search engines; speech recognition; vocabulary; ASR out-of-vocabulary; ASR transcripts; Web; automatic speech recognition; browsable keyword generation; indexable keyword generation; keyword extraction; mutual information criteria; phrase keyword generation; search engines; speech files; Equations; Mutual information; Search engines; Speech; Speech recognition; Vocabulary; Web pages; keyword extraction; keyword generation; speech recognition; tagging;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location
Prague
ISSN
1520-6149
Print_ISBN
978-1-4577-0538-0
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2011.5947475
Filename
5947475
Link To Document