Author :
Makhoul, John ; Kubala, Francis ; Leek, Timothy ; Liu, Daben ; Nguyen, Long ; Schwartz, Richard ; Srivastava, Amit
Author_Institution :
BBN Technol., Cambridge, MA, USA
Abstract :
With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough ´n´ Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives.
Keywords :
audio signal processing; indexing; information retrieval; multimedia databases; speech processing; speech recognition; Internet; Rough n Ready; audio data; audio indexing; audio retrieval; audio segments; continuous audio input stream; data browsing; indexing; information retrieval; language technologies; large audio archives; name spotting; selective search queries; speaker segmentation; speaker-independent continuous speech recognition; speech data indexing; speech technologies; spoken words; stored information; stories; story segmentation; structural features; structural summarization; topic classification; topic content; unlimited data storage capabilities; voice commands; Audio databases; Indexing; Information retrieval; Internet; Memory; Natural languages; Paper technology; Spatial databases; Speech recognition; Streaming media;