Title of article :
Document Expansion for Speech Retrieval
Author/Authors :
Singhal، Amit نويسنده , , Pereira، Fernando نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 1999
Abstract :
Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of "aboutness" for documents - is a document about a certain concept - has been at the core of document indexing for the entire history of IR. This problem is more difficult for speech indexing since automatic speech transcriptions often contain mistakes. In this study we show that document expansion can be successfully used to alleviate the effect of transcription mistakes on speech retrieval. The loss of retrieval effectiveness due to automatic transcription errors can be reduced by document expansion from 15-27% relative to retrieval from human transcriptions to only about 7-13%, even for automatic transcriptions with word error rates as high as 65%. For good automatic transcriptions (25% word error rate), retrieval effectiveness with document expansion is indistinguishable from retrieval from human transcriptions. This makes speech retrieval from automatic transcriptions, even poor ones, competitive with retrieval from perfect transcriptions.
Keywords :
Digital library , archival documents
Journal title :
SIGIR FORUM
Journal title :
SIGIR FORUM