DocumentCode
352477
Title
Fast latent semantic indexing of spoken documents by using self-organizing maps
Author
Kurimo, Mikko
Author_Institution
IDIAP, Martigny, Switzerland
Volume
6
fYear
2000
fDate
2000
Firstpage
2425
Abstract
This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TREC´s spoken document retrieval databases (www.idiap.ch/kurimo/thisl.html)
Keywords
database indexing; information retrieval; natural languages; self-organising feature maps; speech recognition; TREC; TV broadcast news; audio documents; closest document clusters; information retrieval; large vocabulary continuous speech recognition; latent semantic indexing; natural language processing; query results; radio broadcast news; random mapping; self-organizing maps; spoken document indexing; spoken document retrieval database; vectors; word error rate; Indexing; Information retrieval; Large scale integration; Natural language processing; Radio broadcasting; Smoothing methods; Speech recognition; TV broadcasting; Visual databases; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1520-6149
Print_ISBN
0-7803-6293-4
Type
conf
DOI
10.1109/ICASSP.2000.859331
Filename
859331
Link To Document