DocumentCode :
2180446
Title :
Using latent topic features to improve binary classification of spoken documents
Author :
Wintrode, Jonathan
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
5544
Lastpage :
5547
Abstract :
In many topic identification applications, supervised training labels are indirectly related to the semantic content of the documents being classified. For example, many topically distinct emails will all be assigned a single broad category label of "spam" or "not-spam", and a two-class classifier will lack direct knowledge of the underlying topic structure. This paper examines the degradation of topic identification performance on conversational speech when multiple semantic topics are combined into a single broad category. We then develop techniques using document clustering and Latent Dirchlet Allocation (LDA) to exploit the underlying semantic topics which improve performance over classifiers trained on the single category label by up to 20%.
Keywords :
speech recognition; LDA; conversational speech identification performance; latent Dirchlet allocation; latent topic features; spoken document binary classification; two-class classifier; Detectors; Error analysis; Semantics; Speech; Speech recognition; Support vector machines; Training; LDA; clustering; topic identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947615
Filename :
5947615
Link To Document :
بازگشت