DocumentCode
2980843
Title
Combined models for topic spotting and topic-dependent language modeling
Author
Bigi, Brigitte ; Mori, Renato Dee ; El-Béze, Marc ; Spriet, Thierry
Author_Institution
Avignon Univ., France
fYear
1997
fDate
14-17 Dec 1997
Firstpage
535
Lastpage
542
Abstract
A new statistical method for language modeling and spoken document classification is proposed. It is based on a mixture of topic dependent probabilities. Each topic dependent probability is in turn a mixture of n-gram probabilities and the probability of Kullback-Lieber (KL) distances between keyword unigrams and distribution obtained from the content of a cache memory. Experimental result on topic classification using a corpus of 60 Mword from the French newspaper Le Monde show the excellent performance of the cache memory and its complementary role in providing different statistics for the decision process
Keywords
cache storage; natural languages; pattern classification; probability; speech recognition; French newspaper Le Monde; Kullback-Lieber distances; cache memory; combined models; decision process; keyword unigrams; n-gram probabilities; spoken document classification; statistical method; topic classification; topic dependent language modeling; topic dependent probabilities; topic spotting; Cache memory; History; Information retrieval; Natural languages; Probability; Statistical analysis; Statistical distributions; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on
Conference_Location
Santa Barbara, CA
Print_ISBN
0-7803-3698-4
Type
conf
DOI
10.1109/ASRU.1997.659133
Filename
659133
Link To Document