مرکز منطقه ای اطلاع رساني علوم و فناوري - Using latent topic features to improve binary classification of spoken documents

DocumentCode :

2180446

Title :

Using latent topic features to improve binary classification of spoken documents

Author :

Wintrode, Jonathan

Author_Institution :

Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

5544

Lastpage :

5547

Abstract :

In many topic identification applications, supervised training labels are indirectly related to the semantic content of the documents being classified. For example, many topically distinct emails will all be assigned a single broad category label of "spam" or "not-spam", and a two-class classifier will lack direct knowledge of the underlying topic structure. This paper examines the degradation of topic identification performance on conversational speech when multiple semantic topics are combined into a single broad category. We then develop techniques using document clustering and Latent Dirchlet Allocation (LDA) to exploit the underlying semantic topics which improve performance over classifiers trained on the single category label by up to 20%.

Keywords :

speech recognition; LDA; conversational speech identification performance; latent Dirchlet allocation; latent topic features; spoken document binary classification; two-class classifier; Detectors; Error analysis; Semantics; Speech; Speech recognition; Support vector machines; Training; LDA; clustering; topic identification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947615

Filename :

5947615

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2180446