Title :
Speech transcript analysis for automatic search
Author :
Coden, Anni R. ; Brown, Eric W.
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
We address the problem of finding collateral information pertinent to a live television broadcast in real time. The solution starts with a text transcript of the broadcast generated by an automatic speech recognition system. Speaker independent speech recognition technology, even when tailored for a broadcast scenario, generally produces transcripts with relatively low accuracy. Given this limitation, we have developed algorithms that can determine the essence of the broadcast from these transcripts. Specifically, we extract named entities, topics, and sentence types from the transcript and use them to automatically generate both structured and unstructured search queries. A novel distance-ranking algorithm is used to select relevant information from the search results. The whole process is performed online and the query results (i.e., the collateral information) are added to the broadcast stream.
Keywords :
information retrieval; real-time systems; speech recognition; television broadcasting; text analysis; automatic search; automatic speech recognition system; broadcast scenario; broadcast stream; collateral information; distance-ranking algorithm; live television broadcast; named entities; query results; real time TV broadcast; sentence types; speaker independent speech recognition technology; speech transcript analysis; structured search queries; text transcript; unstructured search queries; Auditory displays; Computer displays; Couplings; Data mining; Information analysis; Multimedia communication; Speech analysis; Speech recognition; Streaming media; TV broadcasting;
Conference_Titel :
System Sciences, 2001. Proceedings of the 34th Annual Hawaii International Conference on
Conference_Location :
Maui, HI, USA
Print_ISBN :
0-7695-0981-9
DOI :
10.1109/HICSS.2001.926473