DocumentCode :
180187
Title :
Limited resource term detection for effective topic identification of speech
Author :
Wintrode, Jonathan ; Khudanpur, Sanjeev
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Balitmore, MD, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
7118
Lastpage :
7122
Abstract :
We consider the task of identifying topics in recorded speech across many languages. We identify a statistically discriminative set of topic keywords, and examine the relationship between overall word error rate (WER), keyword-specific detection performance, and topic identification (Topic ID) performance on the Fisher Spanish corpus. Building increasingly constrained systems - from copious to limited training LVCSR to limited-vocabulary keyword spotting - we show that neither high WER (>60%) nor low-precision term detection (<;40%) are necessarily impediments to Topic ID. By using deep neural net acoustic models for keyword spotting, we can double recall and ranked retrieval performance over comparable PLP-based models and achieve Topic ID performance on par with well-trained LVCSR or human transcripts.
Keywords :
natural language processing; neural nets; speech recognition; Fisher Spanish corpus; LVCSR; deep neural net acoustic models; effective topic identification; keyword spotting; keyword-specific detection performance; limited resource term detection; limited-vocabulary keyword spotting; recorded speech; topic ID performance; topic identification performance; topic keywords; word error rate; Acoustics; Conferences; Hidden Markov models; Speech; Speech processing; Training; Vocabulary; Automatic speech recognition; deep neural networks; spoken term detection; topic identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854981
Filename :
6854981
Link To Document :
بازگشت