Title :
A Novel Method of Language Modeling for Automatic Captioning in TC Video Teleconferencing
Author :
Zhang, Xiaojia ; Zhao, Yunxin ; Schopp, Laura
Author_Institution :
Dept. of Comput. Sci., Missouri Univ., Columbia, MO
fDate :
5/1/2007 12:00:00 AM
Abstract :
We are developing an automatic captioning system for teleconsultation video teleconferencing (TC-VTC) in telemedicine, based on large vocabulary conversational speech recognition. In TC-VTC, doctors´ speech contains a large number of infrequently used medical terms in spontaneous styles. Due to insufficiency of data, we adopted mixture language modeling, with models trained from several datasets of medical and nonmedical domains. This paper proposes novel modeling and estimation methods for the mixture language model (LM). Component LMs are trained from individual datasets, with class n-gram LMs trained from in-domain datasets and word n-gram LMs trained from out-of-domain datasets, and they are interpolated into a mixture LM. For class LMs, semantic categories are used for class definition on medical terms, names, and digits. The interpolation weights of a mixture LM are estimated by a greedy algorithm of forward weight adjustment (FWA). The proposed mixing of in-domain class LMs and out-of-domain word LMs, the semantic definitions of word classes, as well as the weight-estimation algorithm of FWA are effective on the TC-VTC task. As compared with using mixtures of word LMs with weights estimated by the conventional expectation-maximization algorithm, the proposed methods led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy
Keywords :
expectation-maximisation algorithm; greedy algorithms; interpolation; linguistics; natural language processing; speech recognition; teleconferencing; telemedicine; automatic captioning system; captioning accuracy; expectation-maximization algorithm; forward weight adjustment; greedy algorithm; interpolation weights; large vocabulary conversational speech recognition; medical terms; mixture language modeling; semantic categories; semantic definitions; teleconsultation video teleconferencing; telemedicine; weight-estimation algorithm; word classes; Automatic speech recognition; Interpolation; Natural languages; Parameter estimation; Predictive models; Probability; Speech recognition; Teleconferencing; Telemedicine; Vocabulary; Automatic speech recognition; mixture language model (LM); teleconsultation (TC); telemedicine; video teleconferencing;
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
DOI :
10.1109/TITB.2006.885549