DocumentCode
1401419
Title
Speech and language technologies for audio indexing and retrieval
Author
Makhoul, John ; Kubala, Francis ; Leek, Timothy ; Liu, Daben ; Nguyen, Long ; Schwartz, Richard ; Srivastava, Amit
Author_Institution
BBN Technol., Cambridge, MA, USA
Volume
88
Issue
8
fYear
2000
Firstpage
1338
Lastpage
1353
Abstract
With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough ´n´ Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives.
Keywords
audio signal processing; indexing; information retrieval; multimedia databases; speech processing; speech recognition; Internet; Rough n Ready; audio data; audio indexing; audio retrieval; audio segments; continuous audio input stream; data browsing; indexing; information retrieval; language technologies; large audio archives; name spotting; selective search queries; speaker segmentation; speaker-independent continuous speech recognition; speech data indexing; speech technologies; spoken words; stored information; stories; story segmentation; structural features; structural summarization; topic classification; topic content; unlimited data storage capabilities; voice commands; Audio databases; Indexing; Information retrieval; Internet; Memory; Natural languages; Paper technology; Spatial databases; Speech recognition; Streaming media;
fLanguage
English
Journal_Title
Proceedings of the IEEE
Publisher
ieee
ISSN
0018-9219
Type
jour
DOI
10.1109/5.880087
Filename
880087
Link To Document