Author_Institution :
Erik Jonsson Sch. of Eng. & Comput. Sci., Texas Univ., Dallas, TX
Abstract :
Summary form only given. The problem of reliable speech recognition for information retrieval is a challenging problem when data is recorded across different media, known/unknown equipment, and different speaking environments. In this talk, we consider problems in audio stream phrase recognition for spoken document retrieval from audio materials spanning the past 110 years. When considering audio transcription for SDR, what should be transcribed? Audio content for broadcast news includes commercials, competing speakers, radio call-in shows, background music, over a wide range of recording conditions. This talk considers the evolution of SDR needed over the past 100 years, with emphasis on acoustics due to speaker, noise, and equipment, while text processing based concepts are considered in the following presentation by Jerome Bellegarda, Apple Corp. Early recordings during the late 1890´s and early 1900´s were carefully structured and scripted, but employed Edison wax cylinder disk recording formats resulting in reasonable speech structure but poor acoustic recordings. As the cost and ease of recording speeches, debates, and broadcast transmissions evolved, less structured audio content becomes more common with a wider range of equipment. The explosion of audio materials, audio Web portals, audio file-sharing frameworks, makes cataloging and organizing audio content for SDR increasingly important and challenging. Varying audio formats for file sharing, as well as the need to ensure ownership through digital watermarking, introduces a number of issues that can also impact speech recognition performance for SDR. We consider a number of areas and approaches taken for effective SDR, and discuss directions for future information detection schemes for richer information retrieval for the next generation of SDR. Finally, as audio material continues to expand at a rapid pace, automatic transcription support for digital archives and libraries is needed in the future
Keywords :
audio signal processing; information retrieval; peer-to-peer computing; speech recognition; watermarking; audio Web portals; audio file-sharing frameworks; audio materials; audio stream phrase recognition; automatic transcription support; digital watermarking; information retrieval; speech recognition; spoken document retrieval; Acoustic noise; Audio recording; Disk recording; Information retrieval; Loudspeakers; Music information retrieval; Radio broadcasting; Speech recognition; Streaming media; Text processing;