Speech and language technologies for audio indexing and retrieval

Author

Makhoul, John ; Kubala, Francis ; Leek, Timothy ; Liu, Daben ; Nguyen, Long ; Schwartz, Richard ; Srivastava, Amit

Author_Institution

BBN Technol., Cambridge, MA, USA

Volume

88

Issue

8

fYear

2000

Firstpage

1338

Lastpage

1353

Abstract

With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough ´n´ Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives.

Keywords

audio signal processing; indexing; information retrieval; multimedia databases; speech processing; speech recognition; Internet; Rough n Ready; audio data; audio indexing; audio retrieval; audio segments; continuous audio input stream; data browsing; indexing; information retrieval; language technologies; large audio archives; name spotting; selective search queries; speaker segmentation; speaker-independent continuous speech recognition; speech data indexing; speech technologies; spoken words; stored information; stories; story segmentation; structural features; structural summarization; topic classification; topic content; unlimited data storage capabilities; voice commands; Audio databases; Indexing; Information retrieval; Internet; Memory; Natural languages; Paper technology; Spatial databases; Speech recognition; Streaming media;

fLanguage

English

Journal_Title

Proceedings of the IEEE

Publisher

ieee

ISSN

0018-9219

Type

jour

DOI

10.1109/5.880087

Filename

880087