Title :
MCE Training Techniques for Topic Identification of Spoken Audio Documents
Author :
Hazen, Timothy J.
Author_Institution :
MIT Lincoln Lab., Lexington, MA, USA
Abstract :
In this paper, we discuss the use of minimum classification error (MCE) training as a means for improving traditional approaches to topic identification such as naive Bayes classifiers and support vector machines. A key element of our new MCE training techniques is their ability to efficiently apply jackknifing or leave-one-out training to yield improved models which generalize better to unseen data. Experiments were conducted using recorded human-human telephone conversations from the Fisher Corpus using feature vector representations from word-based automatic speech recognition lattices. Sizeable improvements in topic identification accuracy using the new MCE training techniques were observed.
Keywords :
pattern classification; speaker recognition; text analysis; word processing; Fisher Corpus; MCE training techniques; feature vector representations; human-human telephone conversations; minimum classification error; speech recognition; spoken audio documents; topic identification; word processing; Kernel; Lattices; Optimization; Speech processing; Speech recognition; Support vector machines; Training; Discriminative training; machine learning; speech recognition; topic identification;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2139207